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HUMAN GENES AND GENE EXPRESSION PRODUCTS 
ISOLATED FROM HUMAN PROSTATE 

Field of the Invention 

5 The present invention relates to polynucleotides of human origin, particularly in human 

prostate, and the encoded gene products. 

Background of the Invention 

Identification of novel polynucleotides, particularly those that encode an expressed gene 

product, is important in the advancement of drug discovery, diagnostic technologies, and the 
10 understanding of the progression and nature of complex diseases such as cancer. Identification of 

genes expressed in different cell types isolated from sources that differ in disease state or stage, 

developmental stage, exposure to various environmental factors, the tissue of origin, the species from 

which the tissue was isolated, and the like is key to identifying the genetic factors that are responsible 

for the phenotypes associated with these various differences. 
1 5 This invention provides novel human polynucleotides, the polypeptides encoded by these 

polynucleotides, and the genes and proteins corresponding to these novel polynucleotides. 
Summary of the Invention 

This invention relates to novel human polynucleotides and variants thereof, their encoded 
polypeptides and variants thereof, to genes corresponding to these polynucleotides and to proteins 
20 expressed by the genes. The invention also relates to diagnostics and therapeutics comprising such 

novel human polynucleotides, their corresponding genes or gene products, including probes, antisense 
nucleotides, and antibodies. The polynucleotides of the invention correspond to a polynucleotide 
comprising the sequence information of at least one of SEQ ID NOS: 1-1485. The polypeptides of the 
invention correspond to a polypeptide comprising the amino acid sequence information of at least one 
25 of SEQ ID NOS: 1486-1 542. 

Accordingly, in one aspect, the invention provides an isolated polynucleotide comprising a 
nucleotide sequence which hybridizes under stringent conditions to a sequence selected from the 
group consisting of SEQ ID NOS: 1-1485. 

In another aspect, the invention provides an isolated polynucleotide comprising at least 15 
30 contiguous nucleotides of a nucleotide sequence having at least 90% sequence identity to a sequence 
selected from the group consisting of SEQ ID NOS: 1-1485, a degenerate variant of SEQ ID NOS: 1- 
1485, an antisense of SEQ ID NOS:l-1485, and a complement of SEQ ID NOS:l-1485. 

In another aspect, the invention provides an isolated polynucleotide comprising at least 15 
contiguous nucleotides of a nucleotide sequence selected from the group consisting of SEQ ID 
35 NOS.1-1485, a degenerate variant of SEQ ID NOS: 1-1485, an antisense of SEQ ID NOS: 1-1485, and 
a complement of SEQ ID NOS:l-1485. In specific embodiments, the polynucleotide comprises at 
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least 100 contiguous nucleotides of the nucleotide sequence. In other specific embodiments, the 

polynucleotide comprises at least 200 contiguous nucleotides of the nucleotide sequence. 

In another aspect, the invention provides An isolated polynucleotide comprising a nucleotide 

sequence of at least 90% sequence identity to a sequence selected from the group consisting of SEQ 
5 K)NOS:l-1485, a degenerate variant of SEQ ID NOS: 1-1485, an antisense of SEQ ID NOS: 1-1485, 

and a complement of SEQ ID NOS: 1-1485. In specific embodiments, the polynucleotide comprises a 

nucleotide sequence of at least 95% sequence identity to the selected nucleotide sequence. In other 

specific embodiments, the polynucleotide comprises a nucleotide sequence that is identical to the 

selected nucleotide sequence. 
1 0 In another aspect, the invention provides a polynucleotide comprising a nucleotide sequence 

of an insert contained in a clone deposited as NRRL Accession No. B-30523, B-30524, B-30525, B- 

30526, B-30527, B-30528, B-30529, orB-30581. 

In another aspect, the invention provides an isolated cDNA obtained by the process of 

amplification using a polynucleotide comprising at least 15 contiguous nucleotides of a nucleotide 
15 sequence selected from the group consisting of SEQ ID NOS: 1-1485. In specific embodiments, the 

polynucleotide comprises at least 25 contiguous nucleotides of the selected nucleotide sequence. In 

other specific embodiments, the polynucleotide comprises at least 100 contiguous nucleotides of the 

selected nucleotide sequence. In some embodiments, the amplification is by polymerase chain 

reaction (PCR) amplification. 
20 In another aspect, the invention provides an isolated recombinant host cell containing a 

polynucleotide of the invention. 

In another aspect, the invention provides an isolated vector comprising a polynucleotide of the 

invention. 

In another aspect, the invention provides a method for producing a polypeptide, the method 
25 comprising the steps of culturing a recombinant host cell containing a polynucleotide of the invention 
under conditions suitable for the expression of an encoded polypeptide and recovering the polypeptide 
from the host cell culture. 

In another aspect, the invention provides an isolated polypeptide encoded by a poynucleotide 
of the invention. 

30 In another aspect, the invention provides an isolated polypeptide comprising an amino acid 

sequence selected from the group consisting of SEQ ID NOS: 1486-1542. 

hi another aspect, the invention provides an antibody that specifically binds a polypeptide of 
the invention. 

In another aspect, the invention provides a method of detecting differentially expressed genes 
3 5 correlated with a cancerous state of a mammalian cell, the method comprising the step of detecting at 
least one differentially expressed gene product in a test sample derived from a cell suspected of being 
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cancerous, where the gene product is encoded by a gene comprising an identifying sequence of at least 
one of SEQ ID NOS:l-1485. Detection of the differentially expressed gene product is correlated with 
a cancerous state of the cell from which the test sample was derived. 

In another aspect, the invention provides a method of detecting differentially expressed genes 

5 correlated with a cancerous state of a mammalian cell, the method comprising the step of detecting at 
least one differentially expressed gene product in a test sample derived from a cell suspected of being 
cancerous, where the gene product comprises an amino acid sequence selected from the group 
consisting of SEQ ID NOS: 1486-1 542. Detection of the differentially expressed gene product is 
correlated with a cancerous state of the cell from which the test sample was derived. 

10 In another aspect, the invention provides a library of polynucleotides, wherein at least one of 

the polynucleotides comprises the sequence information of a polynucleotide of the invention. In 
specific embodiments, the library is provided on a nucleic acid array. In some embodiments, the 
library is provided in a computer-readable format. 

In another aspect, the invention provides a method of inhibiting tumor growth by modulating 

1 5 expression of a gene product, the gene product being encoded by a gene identified by a sequence 
selected from the group consisting of SEQ ID NOS:l-1485. 

ha anotiier aspect, the invention provides a method of inhibiting tumor growth by modulating 
expression of a gene product, the gene product comprising an amino acid sequence selected from the 
group consisting of SEQ ID NOS: 1486-1542. 

20 These and other objects, advantages, and features of the invention will become apparent to 

those persons skilled in the art upon reading the details of the invention as more fully described below. 
Detailed Description of the Invention 

Before the present invention is described, it is to be understood that this invention is not 
limited to particular embodiments described, as such may, of course, vary. It is also to be understood 
25 that the terminology used herein is for the purpose of describing particular embodiments only, and is 
not intended to be limiting. 

Unless defined otherwise, all technical and scientific terms used herein have the same 
meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. 
Although any methods and materials similar or equivalent to those described herein can be used in the 
30 practice or testing of the present invention, the preferred methods and materials are now described. 

All publications and patent applications cited in this specification are herein incorporated by 
reference as if each individual publication or patent application were specifically and individually 
indicated to be incorporated by reference. The citation of any publication is for its disclosure prior to 
the filing date and should not be construed as an admission that the present invention is not entitled to 
3 5 antedate such publication by virtue of prior invention. 



3 



WO 2004/039943 



PCT/LS2003/015465 



It must be noted that as used herein and in the appended claims, the singular forms "a," "and," 
and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, 
reference to "a polynucleotide" includes a plurality of such polynucleotides and reference to "the colon 
cancer cell" includes reference to one or more cells and equivalents thereof known to those skilled in 
5 the art, and so forth. 

The publications and applications discussed herein are provided solely for their disclosure 
prior to the filing date of the present application. Nothing herein is to be construed as an admission 
that the present invention is not entitled to antedate such publication by virtue of prior invention. 
Further, the dates of publication provided may be different from the actual publication dates which 
1 0 may need to be independently confirmed. 

Definitions 

The terms "polynucleotide" and "nucleic acid," used interchangeably herein, refer to a 
polymeric fonns of nucleotides of any length, either ribonucleotides or deoxynucleotides. Thus, these 
terms include, but are not limited to, single-, double-, or multi-stranded DNA or RNA, genomic DNA, 
15 cDNA, DNA-RNA hybrids, branched nucleic acid (see, e.g., U.S. Pat. Nos. 5,124,246; 5,710,264; 
and 5,849,481) , or a polymer comprising purine and pyrimidine bases or other natural, chemically or 
biochemically modified, non-natural, or derivatized nucleotide bases. These terms furhter include, but 
are not limited to, mRNA or cDNA that comprise intronic sequences (see, e.g. , Niwa et al. (1999) Cell 
99(7): 69 1-702). Hie backbone of the polynucleotide can comprise sugars and phosphate groups (as 
20 may typically be found in RNA or DNA), or modified or substituted sugar or phosphate groups. 

Alternatively, the backbone of the polynucleotide can comprise a polymer of synthetic subunits such 
as phosphoramidites and thus can be an oligodeoxynucleoside phosphoramidate or a mixed 
phosphoramidate-phosphodiester oligomer. Peyrottes et al. (1996) Nucl. Acids Res. 24:1 841-1 848; 
Chaturvedi et al. (1996) Nucl. Acids Res. 24:23 18-2323. A polynuclotide may comprise modified 
25 nucleotides, such as methylated nucleotides and nucleotide analogs, uracyl, other sugars, and linking 
groups such as fluororibose and thioate, and nucleotide branches. The sequence of nucleotides may 
be interrupted by non-nucleotide components. A polynucleotide may be further modified after 
polymerization, such as by conjugation with a labeling component. Other types of modifications 
included in this definition are caps, substitution of one or more of the naturally occurring nucleotides 
3 0 with an analog, and introduction of means for attaching the polynucleotide to proteins, metal ions, 
labeling components, other polynucleotides, or a solid support. 

The terms "polypeptide" and "protein," used interchangebly herein, refer to a polymeric form 
of amino acids of any length, which can include coded and non-coded amino acids, chemically or 
biochemically modified or derivatized amino acids, and polypeptides having modified peptide 
3 5 backbones. The term includes fusion proteins, including, but not limited to, fusion proteins with a 
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heterologous amino acid sequence, fusions with heterologous and homologous leader sequences, with 
or without N-terminal methionine residues; immunologically tagged proteins; and the like. 

"Diagnosis" as used herein generally includes determination of a subject's susceptibility to a 
disease or disorder, determination as to whether a subject is presently affected by a disease or disorder, 
5 prognosis of a subject affected by a disease or disorder {e.g., identification of pre-metastatic or 

metastatic cancerous states, stages of cancer, or responsiveness of cancer to therapy), and therametrics 
(e.g., monitoring a subject's condition to provide information as to the effect or efficacy of therapy). 

"Sample" or "biological sample" as used herein encompasses a variety of sample types, and 
are generally meant to refer to samples of biological fluids or tissues, particularly samples obtained 

1 0 from tissues, especially from cells of the type associated with a disease or condition for which a 

diagnostic application is designed (e.g., ductal adenocarcinoma), and the like. "Sample" or "biological 
sample" are meant to encompass blood and other liquid samples of biological origin, solid tissue 
samples, such as a biopsy specimen or tissue cultures or cells derived therefrom and the progeny 
thereof. These terms encompass samples that have been manipulated in any way after their 

1 5 procurement as well as derivatives and fractions of samples, where the samples may be maniuplated 
by, for example, treatment with reagents, solubilization, or enrichment for certain components. The 
terms also encompass clinical samples, and also includes cells in cell culture, cell supernatants, cell 
lysates, serum, plasma, biological fluids, and tissue samples. Where the sample is solid tissue, the cells 
of the tissue can be dissociated or tissue sections can be analyzed. 

20 The terms "treatment," "treating," "treat" and the like are used herein to generally refer to 

obtaining a desired pharmacologic and/or physiologic effect. The effect may be prophylactic in terms 
of completely or partially preventing a disease or symptom thereof and/or may be therapeutic in terms 
of a partial or complete stabilization or cure for a disease and/or adverse effect attributable to the 
disease. "Treatment" as used herein covers any treatment of a disease in a mammal, particularly a 

25 human, and includes: (a) preventing the disease or symptom from occurring in a subject which may 
be predisposed to the disease or symptom but has not yet been diagnosed as having it; (b) inhibiting 
the disease symptom, i.e., arresting its development; or relieving the disease symptom, i.e., causing 
regression of the disease or symptom. 

The terms "individual," "subject," "host," and "patient," used interchangeably herein and refer 

30 to any mammalian subject for whom diagnosis, treatment, or therapy is desired, particularly humans. 
Other subjects may include cattle, dogs, cats, guinea pigs, rabbits, rats, mice, horses, and so on. 

As used herein the tenn "isolated" refers to a polynucleotide, a polypeptide, an antibody, or a 
host cell that is in an environment different from that in which the polynucleotide, the polypeptide, the 
antibody, or the host cell naturally occurs. A polynucleotide, a polypeptide, an antibody, or a host cell 

35 which is isolated is generally substantially purified. As used herein, the term "substantially purified" 
refers to a compound (e.g., either a polynucleotide or a polypeptide or an antibody) that is removed 
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from its natural environment and is at least 60% free,- preferably 75% free, and most preferably 90% 
free from other components with which it is naturally associated. Thus, for example, a composition 
containing A is "substantially free of B when at least 85% by weight of the total A+B in the 
composition is A. Preferably, A comprises at least about 90% by weight of the total of A+B in the 

5 composition, more preferably at least about 95% or even 99% by weight. 

A "host cell," as used herein, refers to a microorganism or a eukaryotic cell or cell line 
cultured as a unicellular entity which can be, or has been^-used as a recipient for a recombinant vector 
or other transfer polynucleotides, and include the progeny of the original cell which has been 
transfected. It is understood that die progeny of a single cell may not necessarily be completely 

10 identical hi morphology or in genomic or total DNA complement as the original parent, due to natural, 
accidental, or deliberate mutation. 

The terms "cancer," "neoplasm," "tumor," and "carcinoma," are used interchangeably herein 
to refer to cells which exhibit relatively autonomous growth, so that they exhibit an aberrant growth 
phenotype characterized by a significant loss of control of cell proliferation. In general, cells of 

1 5 interest for detection or treatment in the present application include precancerous (e.g. , benign), 

malignant, metastatic, and non-metastatic cells. Detection of cancerous cell is of particular interest. 

The use of "e", as in 1 0c-3, indicates that the number to the left of "e" is raised to the power of 
the number to the right of "e" (dius, 10e-3 is 10' 3 ). 

Hie term "heterologous" as used herein in the context of, for example, heterologous nucleic 

20 acid or amino acid sequences, heterologous polypeptides, or heterologous nucleic acid, is meant to 
refer to material that originates from a source different from that with which it is joined or associated. 
For example, two DNA sequences are heterologous to one another if the sequences are from different 
genes or from different species. A recombinant host cell containing a sequence that is heterologous to 
die host cell can be, for example, a bacterial cell containing a sequence encoding a human 

25 polypeptide. 

The invention relates to polynucleotides comprising the disclosed nucleotide sequences, to 
full length cDNA, mRNA, genomic sequences, and genes corresponding to these sequences and 
degenerate variants thereof, and to polypeptides encoded by the polynucleotides of the invention and 
polypeptide variants. The following detailed description describes the polynucleotide compositions 

3 0 encompassed by the invention, methods for obtaining cDNA or genomic DNA encoding a full-length 
gene product, expression of these polynucleotides and genes, identification of structural motifs of the 
polynucleotides and genes, identification of the function of a gene product encoded by a gene 
corresponding to a polynucleotide of the invention, use of the provided polynucleotides as probes and 
in mapping and in tissue profiling, use of the corresponding polypeptides and other gene products to 

3 5 raise antibodies, and use of the polynucleotides and their encoded gene products for therapeutic and 
diagnostic purposes. 
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Polynucleotide Compositions 

The present invention provides isolated polynucleotides that represent genes that are 
differentially expressed in human cancer cells. The polynucleotides, as well as polypeptides encoded 
thereby, find use in a variety of therapeutic and diagnostic methods. 
5 The scope of the invention with respect to compositions containing the isolated 

polynucleotides useful in the methods described herein includes, but is not necessarily limited to, 
polynucleotides having a sequence set forth in any one of the polynucleotide sequences provided 
herein; polynucleotides obtained from the biological materials described herein or other biological 
sources (particularly human sources) by hybridization under stringent conditions (particularly 

10 conditions of high stringency); genes corresponding to the provided polynucleotides; cDNAs 

corresponding to the provided polynucleotides; variants of the provided polynucleotides and their 
corresponding genes, particularly those variants that retain a biological activity of the encoded gene 
product (e.g., a biological activity ascribed to a gene product corresponding to the provided 
polynucleotides as a result of the assignment of the gene product to a protein family(ies) and/or 

15 identification of a functional domain present in the gene product). Other nucleic acid compositions 
contemplated by and within the scope of the present invention will be readily apparent to one of 
ordinary skill in the art when provided with the disclosure here. "Polynucleotide" and "nucleic acid" 
as used herein with reference to nucleic acids of the composition is not intended to be limiting as to 
the length or structure of the nucleic acid unless specifically indicated. 

20 The invention features polynucleotides that represent genes that are expressed in human 

tissue, specifically human breast tissue, particularly polynucleotides that are differentially expressed in 
cancerous breast cells. Nucleic acid compositions described herein of particular interest are at least 
about 15 bp in length, at least about 30 bp in length, at least about 50 bp in length, at least about 100 
bp, at least about 200 bp in length, at least about 300 bp in length, at least about 500 bp in length, at 

25 least about 800 bp in length, at least about 1 kb in length, at least about 2.0 kb in length, at least about 
3.0 kb in length, at least about 5 kb in length, at least about 10 kb in length, at least about 50kb in 
length and are usually less than about 200 kb in length. These polynucleotides (or polynucleotide 
fragments) have uses that include, but are not limited to, diagnostic probes and primers as starting 
materials for probes and primers, as discussed herein. 

30 The subject polynucleotides usually comprise a sequence set forth in any one of the 

polynucleotide sequences provided herein, for example, in the sequence listing, incorporated by 
reference in a table (e.g. by an NCBI accession number), a cDNA deposited at the A.T.C.C., or a 
fragment or variant thereof. A "fragmenf or "portion" of a polynucleotide is a contiguous sequence 
of residues at least about 10 nt to about 12 nt, 15 nt, 16 nt, 1 8 nt or 20 nt in length, usually at least 

3 5 about 22 nt, 24 nt, 25 nt, 30 nt, 40 nt, 50 nt, 60nt, 70 nt, 80 nt, 90 nt, 100 nt to at least about 1 50 nt, 
200 nt, 250 nt, 300 nt, 350 nt, 400 nt, 500 nt, 800 nt or up to about 1000 nt, 1500 or 2000 nt in 
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length. In some embodiments, a fragment of a polynucleotide is the coding sequence of a 
polynucleotide. A fragment of a polynucleotide may start at position 1 (i.e. the first nucleotide) of a 
nucleotide sequence provided herein, or may start at about position 10, 20, 30, 50, 75, 100, 150, 200, 
250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, 1500 or 2000, or an ATG translational 
5 initiation codon of a nucleotide sequence provided herein. In this context "about" includes the 
particularly recited value or a value larger or smaller by several (5, 4, 3, 2, or 1) nucleotides. The 
described polynucleotides and fragments thereof find use as hybridization probes, PCR primers, 
BLAST probes, or as an identifying sequence, for example. 

The subject nucleic acids may be variants or degenerate variants of a sequence provided 

10 herein. In general, a variants of a polynucleotide provided herein have a fragment of sequence identity 
that is greater than at least about 65%, greater than at least about 70%, greater than at least about 75%, 
greater than at least about 80%, greater than at least about 85%, or greater than at least about 90%, 
95%, 96%, 97%, 98%, 99% or more (i.e. 100%) as compared to an identically sized fragment of a 
provided sequence, as determined by the Smith- Waterman homology search algorithm as implemented 

1 5 in MPSRCH program (Oxford Molecular). For the purposes of this invention, a preferred method of 
calculating percent identity is the Smith- Waterman algorithm. Global DNA sequence identity should 
be greater than 65% as determined by the Smith- Waterman homology search algorithm as 
implemented in MPSRCH program (Oxford Molecular) using an gap search with the following search 
parameters: gap open penalty, 12; and gap extension penalty, 1. 

20 The subject nucleic acid compositions include full-length cDNAs or mRNAs that encompass 

an identifying sequence of contiguous nucleotides from any one of the polynucleotide sequences 
provided herein. 

As discussed above, the polynucleotides useful in the methods described herein also include 
polynucleotide valiants having sequence similarity or sequence identity. Nucleic acids having 

25 sequence similarity are detected by hybridization under low stringency conditions, for example, at 
50°C and 10XSSC (0.9 M saline/0.09 M sodium citrate) and remain bound when subjected to 
washing at 55°C in 1XSSC. Sequence identity can be determined by hybridization under high 
stringency conditions, for example, at 50°C or higher and 0. 1 XSSC (9 mM saline/0.9 mM sodium 
citrate). Hybridization methods and conditions are well known in the art, see, e.g., USPN 5,707,829. 

30 Nucleic acids that are substantially identical to the provided polynucleotide sequences, e.g. allelic 
variants, genetically altered versions of the gene, etc., bind to the provided polynucleotide sequences 
under stringent hybridization conditions. By using probes, particularly labeled probes of DNA 
sequences, one can isolate homologous or related genes. The source of homologous genes can be any 
species, e.g. primate species, particularly human; rodents, such as rats and mice; canines, felines, 

35 bovines, ovines, equines, yeast, nematodes, etc. 
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In one embodiment, hybridization is performed using a fragment of at least 15 contiguous 
nucleotides (nt) of at least one of the polynucleotide sequences provided herein. That is, when at least 
15 contiguous nt of one of the disclosed polynucleotide sequences is used as a probe, the probe will 
preferentially hybridize with a nucleic acid comprising the complementary sequence, allowing the 
5 identification and retrieval of the nucleic acids that uniquely hybridize to the selected probe. Probes 
from more than one polynucleotide sequence provided herein can hybridize with the same nucleic acid 
if the cDNA from which they were derived corresponds to one mRNA. 

Polynucleotides contemplated for. use in the invention also include those having a sequence of 
naturally occurring variants of the nucleotide sequences (e.g., degenerate variants (e.g., sequences that 

10 encode the same polypeptides but, due to the degenerate nature of the genetic code, different in 
nucleotide sequence), allelic variants, etc.). Variants of the polynucleotides contemplated by the 
invention are identified by hybridization of putative variants with nucleotide sequences disclosed 
herein, preferably by hybridization under stringent conditions. For example, by using appropriate 
wash conditions, valiants of the polynucleotides described herein can be identified where the allelic 

1 5 variant exhibits at most about 25-3 0% base pair (bp) mismatches relative to the selected 

polynucleotide probe, hi general, allelic variants contain 15-25% bp mismatches, and can contain as 
little as even 5-15%, or 2-5%, or 1-2% bp mismatches, as well as a single bp mismatch. 

The invention also encompasses homologs corresponding to any one of the polynucleotide 
sequences provided herein, where the source of homologous genes can be any mammalian species, 

20 e.g., primate species, particularly human; rodents, such as rats; canines, felines, bovines, ovines, 
equines, yeast, nematodes, etc. Between mammalian species, e.g., human and mouse, homologs 
generally have substantial sequence similarity, e.g., at least 75% sequence identity, usually at least 
80%%, at least 85, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 
even 100% identity between nucleotide sequences. Sequence similarity is calculated based on a 

25 reference sequence, which may be a subset of a larger sequence, such as a conserved motif, coding 
region, flanking region, etc. A reference sequence will usually be at least about a fragment of a 
polynucleotide sequence and may extend to the complete sequence that is being compared. 
Algorithms for sequence analysis are known in the art, such as gapped BLAST, described in Altschul, 
et al. Nucleic Acids Res. (1997) 25:3389-3402, or TeraBLAST available from TimeLogic Corp. 

30 (Ciystal Bay, Nevada). 

Moreover, representative examples of polynucleotide fragments of the invention (useful, for 
example, as probes), include, for example, fragments comprising, or alternatively consisting of, a 
sequence from about nucleotide number 1-50,51-100, 101-150, 151-200, 201-250, 251-300,301-350, 
351-400, 401-450, 451-500, 501-550, 551-600, 651-700,701- 750, 751-800, 800-850, 851-900, 901- 

35 950,951-1000, 1001-1050, 1051-1100, 1101-1150, 1151-1200, 1201-1250, 1251-1300, 1301-1350, 
1351-1400, 1401-1450, 1451-1500, 1501-1550, 1551-1600, 1601-1650, 1651-1700, 1701-1750, 
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1751-1800, 1801-1850, 1851-1900, 1901-1950, 1951-2000, 2001-2050, 2051-2100,2101-2150, 
2151-2200, 2201-2250, 2251-2300, 2301-2350, 2351-2400, 2401-2450, 2451-2500, 2501-2550, 
2551-2600, 2601-2650, 2651-2700, 2701-2750, 2751-2800, 2801-2850, 2851-2900, 2901-2950, 
2951-3000, 3001-3050, 3051-3100, 3101-3150, 3151-3200, 3201-3250, 3251-3300, 3301-3350, 

5 3351-3400, 3401-3450, 3451-3500, 3501-3550, 3551-3600, 3601-3650, 3651-3700, 3701-3750, 
3751-3800, 3801-3850, 3851-3900, 3901-3950, 3951-4000, 4001-4050, 4051-4100, 4101-4150, 
4151-4200, 4201-4250, 4251-4300, 4301-4350, 4351-4400, 4401-4450, 4451-4500, 4501-4550, 
4551-4600, 4601-4650, 4651-4700, 4701-4750, 4751-4800, 4801-4850, 4851-4900, 4901-4950, 
4951-5000, 5001-5050, 5051-5100, 5101-5150, 5151-5200, 5201-5250, 5251-5300, 5301-5350, 

10 5351-5400, 5401- 5450, 5451-5500, 5501-5550, 5551-5600, 5601-5650, 5651-5700, 5701-5750, 

5751-5800, 5801-5850, 5851-5900, 5901-5950, 5951-6000, 6001-6050, 6051-6100, 6101-6150, and 
61 5 1 of a subject nucleic acid, or the complementary strand thereto. In this context "about" includes 
the particularly recited range or a range larger or smaller by several (5, 4, 3, 2, or 1) nucleotides, at 
either terminus or at bodi termini. In some embodiments, these fragments encode a polypeptide which 

1 5 has a functional activity (e.g., biological activity) whereas in other embodiments, these fragments are 
probes, or starting materials for probes. Polynucleotides which hybridize to one or more of these 
nucleic acid molecules under stringent hybridization conditions or alternatively, under lower 
stringency conditions, are also encompassed by the invention, as are polypeptides encoded by these 
polynucleotides or fragments. 

20 The subject nucleic acids can be cDNAs or genomic DNAs, as well as fragments thereof, 

particularly fragments that encode a biologically active gene product and/or are useful in the methods 
disclosed herein (e.g., in diagnosis, as aunique identifier of a differentially expressed gene of interest, 
etc.). The term "cDNA" as used herein is intended to include all nucleic acids that share the 
arrangement of sequence elements found in native mature mRNA species, where sequence elements 

25 are exons and 3 ' and 5 ' non-coding regions. Normally mRNA species have contiguous exons, with 
the intervening introns, when present, being removed by nuclear RNA splicing, to create a continuous 
open reading frame encoding a polypeptide. mRNA species can also exist with both exons and 
introns, where the introns may be removed by alternative splicing. Furthermore it should be noted that 
different species of mRNAs encoded by the same genomic sequence can exist at varying levels in a 

30 cell, and detection of these various levels of mRNA species can be indicative of differential expression 
of the encoded gene product in the cell. 

A genomic sequence of interest comprises the nucleic acid present between the initiation 
codon and the stop codon, as defined in the listed sequences, including all of the introns that are 
normally present in a native chromosome. It can further include the 3 ' and 5 ' untranslated regions 

3 5 found in the matu re mRNA. It can further include specific transcriptional and translational regulatory 
sequences, such as promoters, enhancers, etc., including about 1 kb, but possibly more, of flanking 
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genomic DNA at either the 5' and 3' end of the transcribed region. The genomic DNA can be 
isolated as a fragment of 100 kbp or smaller; and substantially free of flanking chromosomal 
sequence. The genomic DNA flanking the coding region, either 3 ' and 5', or internal regulatory 
sequences as sometimes found in introns, contains sequences required for proper tissue, stage-specific, 
5 or disease-state specific expression. 

The nucleic acid compositions of the subject invention can encode all or a part of the 
naturally-occurring polypeptides. Double or single stranded fragments can be obtained from the DNA 
sequence by chemically synthesizing oligonucleotides in accordance with conventional methods, by 
restriction enzyme digestion, by PCR amplification, etc. 

1 0 Probes specific to the polynucleotides described herein can be generated using the 

polynucleotide sequences disclosed herein. The probes are usually a fragment of a polynucleotide 
sequences provided herein. The probes can be synthesized chemically or can be generated from 
longer polynucleotides using restriction enzymes. The probes can be labeled, for example, with a 
radioactive, biotinylated, or fluorescent tag. Preferably, probes are designed based upon an identifying 

1 5 sequence of any one of the polynucleotide sequences provided herein. More preferably, probes are 
designed based on a contiguous sequence of one of the subject polynucleotides that remain unmasked 
following application of a masking program for masking low complexity (e.g., XBLAST, 
RepeatMasker, etc.) to the sequence., Le., one would select an unmasked region, as indicated by the 
polynucleotides outside the poly-n stretches of the masked sequence produced by the masking 

20 program. 

The polynucleotides of interest in the subject invention are isolated and obtained hi substantial 
purity, generally as other than an intact chromosome. Usually, the polynucleotides, either as DNA or 
RNA, will be obtained substantially free of other naturally-occurring nucleic acid sequences that they 
are usually associated with , generally being at least about 50%, usually at least about 90% pure and 

25 are typically "recombinant", e.g. , flanked by one or more nucleotides with which it is not normally' 
associated on a naturally occurring chromosome. 

The polynucleotides described herein can be provided as a linear molecule or within a circular 
molecule, and can be provided within autonomously replicating molecules (vectors) or within 
molecules without replication sequences. Expression of the polynucleotides can be regulated by their 

3 0 own or by other regulatory sequences known in the art. The polynucleotides can be introduced into 
suitable host cells using a variety of techniques available in the art, such as transferrin polycation- 
mediated DNA transfer, transfection with naked or encapsulated nucleic acids, liposome-mediated 
DNA transfer, mtracellular transportation of DNA-coated latex beads, protoplast fusion, viral 
infection, electroporation, gene gun, calcium phosphate-mediated transfection, and the like. 

3 5 The nucleic acid compositions described herein can be used to, for example, produce 

polypeptides, as probes for the detection of mRNA in biological samples (e.g., extracts of human 
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cells) or cDNA produced from such samples, to generate additional copies of the polynucleotides, to 
generate ribozymes or antisense oligonucleotides, and as single stranded DNA probes or as triple- 
strand forming oligonucleotides. The probes described herein can be used to, for example, determine 
the presence or absence of any one of the polynucleotide provided herein or variants thereof in a 
sample. These and other uses are described in more detail below.The subject nucleic acid 
compositions can be used, for example, to produce polypeptides, as probes for the detection of mRNA 
of the invention in biological samples (e.g., extracts of human cells) to generate additional copies of 
the polynucleotides, to generate ribozymes or antisense oligonucleotides, and as single stranded DNA 
probes or as triple-strand forming oligonucleotides. The probes described herein can be used to, for 
example, determine the presence or absence of the polynucleotide sequences as shown in SEQ ID 
NOS:l-1485 or variants thereof in a sample. These and other uses are described in more detail below. 
Use of Polynucleotides to Obtain Full-Length cDNA. Gene, and Promoter Region 
In one embodiment, the polynucleotides are useful as starting materials to construct larger 
molecules. In one example, the polynucleotides of the invention are used to construct polynucleotides 
that encode a larger polypeptide (e.g., up to the full-length native polypeptide as well as fusion 
proteins comprising all or a portion of the native polypeptide) or may be used to produce haptens of 
the polypeptide (e.g. , polypeptides useful to generate antibodies). 

In one particular example, the polynucleotides of the invention are used to make or isolate 
cDNA molecules encoding all or portion of a naturally-occuring polypeptide. Full-length cDNA 
molecules comprising the disclosed polynucleotides are obtained as follows. A polynucleotide having 
a sequence of one of SEQ ID NOS.1-1485, or aportion thereof comprising at least 12, 15, 18, or 20 
nt, is used as a hybridization probe to detect hybridizing members of a cDNA library using probe 
design methods, cloning methods, and clone selection techniques such as those described in USPN 
5,654, 1 73 . Libraries of cDNA are made from selected tissues, such as normal or tumor tissue, or from 
tissues of a mammal treated with, for example, a pharmaceutical agent. Preferably, the tissue is the 
same as the tissue from which the polynucleotides of the invention were isolated, as both the 
polynucleotides described herein and the cDNA represent expressed genes. Most preferably, the 
cDNA library is made from the biological material described herein in the Examples. The choice of 
cell type for library construction can be made after the identity of the protein encoded by the gene 
corresponding to the polynucleotide of the invention is known. This will indicate which tissue and 
cell types are likely to express the related gene, and thus represent a suitable source for the mRNA for 
generating the cDNA. Where the provided polynucleotides are isolated from cDNA libraries, the 
libraries are prepared from mRNA of human prostate cells, more preferably, human prostate cancer 
cells 

Techniques for producing and probing nucleic acid sequence libraries are described, for 
example, in Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd Ed., (1989) Cold Spring 
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Harbor Press, Cold Spring Harbor, NY. The cDNA can be prepared by using primers based on 
polynucleotides comprising a sequence of SEQ ID NOS: 1-1485. In one embodiment, the cDNA 
library can be made from only poly-adenylated mRNA. Thus, poly-T primers can be used to prepare 
cDNA from the mRNA. 

5 Members of the library that are larger than the provided polynucleotides, and preferably that 

encompass the complete coding sequence of the native message, are obtained. In order to confirm that 
the entire cDNA has been obtained, RNA protection experiments are performed as follows. 
Hybridization of a full-length cDNA to an mRNA will protect the RNA from RNase degradation. If 
the cDNA is not full length, .then the portions of the mRNA that are not hybridized will be subject to 

1 0 RNase degradation. This is assayed, as is known in the art, by changes in electrophoretic mobility on 
polyacrylainide gels, or by detection of released monoribonucleotides. Sambrook et al., Molecular 
Cloning: A Laboratory Manual, 2nd Ed., (1989) Cold Spring Harbor Press, Cold Spring Harbor, NY. 
In order to obtain additional sequences 5' to the end of a partial cDNA, 5' RACE (PCR Protocols: A 
Guide to Methods and Applications, (1990) Academic Press, Inc.) can be performed. 

15 Genomic DNA is isolated using the provided polynucleotides in a manner similar to the 

isolation of full-length cDNAs. Briefly, the provided polynucleotides, or portions thereof, are used as 
probes to libraries of genomic DNA. Preferably, the library is obtained from the cell type that was 
used to generate the polynucleotides of the invention, but this is not essential. Most preferably, the 
genomic DNA is obtained from the biological material described herein in the Examples. Such 

20 libraries can be in vectors suitable for carrying large segments of a genome, such as P 1 or YAC, as 

described in detail in Sambrook et al., supra, 9.4-9.30. In addition, genomic sequences can be isolated 
from human BAC libraries, which are commercially available from Research Genetics, Inc., 
Huntsville, Alabama, USA, for example. In order to obtain additional 5' or 3' sequences, chromosome 
walking is performed, as described in Sambrook et al., such that adjacent and overlapping fragments 

25 of^genomic DNA are isolated. These are mapped and pieced together, as is known in the art, using 
restriction digestion enzymes and DNA ligase. 

Using the polynucleotide sequences of the invention, corresponding full-length genes can be 
isolated using both classical and PCR methods to construct and probe cDNA libraries. Using either 
method, Northern blots, preferably, are performed on a number of cell types to determine which cell 

3 0 lines express the gene of interest at the highest level . Classical methods of constructing cDNA 
libraries are taught in Sambrook et al., supra. With these methods, cDNA can be produced from 
mRNA and inserted into viral or expression vectors. Typically, libraries of mRNA comprising 
poly(A) tails can be produced with poly(T) primers. Similarly, cDNA libraries can be produced using 
the instant sequences as primers. 

35 PCR methods are used to amplify the members of a cDNA library that comprise the desired 

insert. In this case, the desired insert will contain sequence from the full length cDNA that 
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corresponds to the instant polynucleotides. Such PCR methods include gene trapping and RACE 
methods. Gene trapping entails inserting a member of a cDNA library into a vector. The vector then 
is denatured to produce single stranded molecules. Next, a substrate-bound probe, such as a 
biotinylated oligo, is used to trap cDNA inserts of interest. Biotinylated probes can be linked to an 

5 avidin-bound solid substrate. PCR methods can be used to amplify the trapped cDNA. To trap 
sequences corresponding to the full length genes, the labeled probe sequence is based on the 
polynucleotide sequences of the invention. Random primers or primers specific to the library vector 
can be used to amplify the trapped cDNA. Such gene trapping techniques are described in Gruber et 
al., WO 95/04745 and Gruber et al., USPN 5,500,356. Kits are commercially available to perform 

10 gene trapping experiments from, for example, Life Technologies, Gaithersburg, Maryland, USA. 

"Rapid amplification of cDNA ends," or RACE, is a PCR method of amplifying cDNAs from 
a number of different RNAs. The cDNAs are ligated to an oligonucleotide linker, and amplified by 
PCR using two primers. One primer is based on sequence from the instant polynucleotides, for which 
full length sequence is desired, and a second primer comprises sequence that hybridizes to the 

1 5 oligonucleotide linker to amplify the cDNA. A description of this method is reported in WO 

97/191 10. In preferred embodiments of RACE, a common primer is designed to anneal to an arbitrary 
adaptor sequence ligated to cDNA ends (Apte and Siebert, Biotechniques (1993) 15:890-893 ; 
Edwards et al., Nuc. Acids Res. (1991) 19:5227-5232). When a single gene-specific RACE primer is 
paired with the common primer, preferential amplification of sequences between the single gene 

20 specific primer and the common primer occurs. Commercial cDNA pools modified for use in RACE 
are available. 

Another PCR-based method generates full-length cDNA library with anchored ends without 
needing specific knowledge of the cDNA sequence. The method uses lock-docking primers (L-VI), 
where one primer, poly TV (I-m) locks over the polyA tail of eukaryotic mRNA producing first strand 
25 synthesis and a second primer, polyGH (TV-VI) locks onto the polyC tail added by terminal 
deoxynucleotidyl transferase (TdT)(see, e.g., WO 96/40998). 

The promoter region of a gene generally is located 5' to the initiation site for RNA 
polymerase II. Hundreds of promoter regions contain the "TATA" box, a sequence such as TATTA 
or TATAA, which is sensitive to mutations. The promoter region can be obtained by performing 5' 
30 RACE using a primer from the coding region of the gene. Alternatively, the cDNA can be used as a 
probe for the genomic sequence, and the region 5' to the coding region is identified by "walking up." 
If the gene is highly expressed or differentially expressed, the promoter from the gene can be of use in 
a regulatory construct for a heterologous gene. 

Once the full-length cDNA or gene is obtained, DNA encoding variants can be prepared by 
35 site-directed mutagenesis, described in detail in Sambrook et al., 15.3-15.63. The choice of codon or 
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nucleotide to be replaced can be based on disclosure herein on optional changes in amino acids to 
achieve altered protein structure and/or function. 

As an alternative method to obtaining DNA or RNA from a biological material, nucleic acid 
comprising nucleotides having the sequence of one or more polynucleotides of the invention can be 
5 synthesized. Thus, the invention encompasses nucleic acid molecules ranging in length from 15 nt 
(corresponding to at least 15 contiguous nt of one of SEQ ID NOS: 1-1485) up to a maximum length 
suitable for one or more biological manipulations, including replication and expression, of the nucleic 
acid molecule. The invention includes but is not limited to (a) nucleic acid having the size of a full 
gene, and comprising at least one of SEQ ID NOS:l-1485; (b) the nucleic acid of (a) also comprising 

10 at least one additional gene, operably linked to permit expression of a fusion protein; (c) an expression 
vector comprising (a) or (b); (d) a plasmid comprising (a) or (b); and (e) a recombinant viral particle 
comprising (a) or (b). Once provided with the polynucleotides disclosed herein, construction or 
preparation of (a) - (e) are well within the skill in the art. 

The sequence of a nucleic acid comprising at least 15 contiguous nt of at least any one of SEQ 

15 ID NOS: 1-1485, preferably the entire sequence of at least any one of SEQK>NOS:l-1485, isnot 
limited and can be any sequence of A, T, G, and/or C (for DNA) and A, U, G, and/or C (for RNA) or 
modified bases thereof, including inosine and pseudouridine. The choice of sequence will depend on 
the desired function and can be dictated by coding regions desired, the intron-like regions desired, and 
the regulatory regions desired. Where the entire sequence of any one of SEQ ID NOS: 1-1485 is 

20 within the nucleic acid, the nucleic acid obtained is referred to herein as a polynucleotide comprising 
the sequence of any one of SEQ ID NOS:l-1485. 

Ex pression of Polypeptide Encoded bvFull-Length cDN A or Full-Length Gene 
The provided polynucleotides (e.g., a polynucleotide having a sequence of one of SEQ ID 
NOS: 1-1485), the corresponding cDNA, or the full-length gene is used to express a partial or 

25 complete gene product. Constructs of polynucleotides having sequences of SEQ ID NOS:l-1485 can 
also be generated synthetically. Alternatively, single-step assembly of a gene and entire plasmid from 
large numbers of oligodeoxyribonucleotides is described by, e.g., Stemmer et al., Gene (Amsterdam) 
(1995) 164(l):49-53. In this method, assembly PCR (the synthesis of long DNA sequences from large 
numbers of oligodeoxyribonucleotides (oligos)) is described. The method is derived from DNA 

30 shuffling (Stemmer, Nature (1994) 370:389-391), and does not rely on DNA ligase, but instead relies 
on DNA polymerase to build increasingly longer DNA fragments during the assembly process. 

Appropriate polynucleotide constructs are purified using standard recombinant DNA 
techniques as described in, for example, Sambrook et ah, Molecular Cloning: A Laboratory Manual, 
2nd Ed, (1989) Cold Spring Harbor Press, Cold Spring Harbor, NY, and under current regulations 

35 described in United States Dept. of HHS, National Institute of Health (NTH) Guidelines for 

Recombinant DNA Research. The gene product encoded by a polynucleotide of the invention is 
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expressed in any expression system, including, for example, bacterial, yeast, insect, amphibian and 
mammalian systems. Vectors, host cells and methods for obtaining expression in same are well 
known in the art. Suitable vectors and host cells are described in USPN 5,654,173 . 

Polynucleotide molecules comprising a polynucleotide sequence provided herein are generally 
propagated by placing the molecule in a vector. Viral and non-viral vectors are used, including 
plasmids. The choice of plasmid will depend on the type of cell in which propagation is desired and 
the purpose of propagation. Certain vectors are useful for amplifying and making large amounts of 
the desired DNA sequence. Other vectors are suitable for expression in cells in culture. Still other 
vectors are suitable for transfer and expression in cells in a whole animal or person. The choice of 
appropriate vector is well within the skill of the art. Many such vectors are available commercially. 
Methods for preparation of vectors comprising a desired sequence are well known in the art. 

The polynucleotides set forth in SEQ ED NOS:l-1485 or their corresponding full-length 
polynucleotides are linked to regulatory sequences as appropriate to obtain the desired expression 
properties. These can include promoters (attached either at the 5' end of the sense strand or at the 3' 
end of the antisense strand), enhancers, terminators, operators, repressors, and inducers. The 
promoters can be regulated or constitutive. In some situations it may be desirable to use conditionally 
active promoters, such as tissue-specific or developmental stage-specific promoters. These are linked 
to the desired nucleotide sequence using the techniques described above for linkage to vectors. Any 
techniques known in the art can be used. 

When any of the above host cells, or other appropriate host cells or organisms, are used to 
replicate and/or express the polynucleotides or nucleic acids of the invention, the resulting replicated 
nucleic acid, RNA, expressed protein or polypeptide, is within the scope of the invention as a product 
of the host cell or organism. Hie product is recovered by any appropriate means known in the art. 

Once the gene corresponding to a selected polynucleotide is identified, its expression can be 
regulated in the cell to which the gene is native. For example, an endogenous gene of a cell can be 
regulated by an exogenous regulatory sequence as disclosed in USPN 5,641,670. 

Identification of Functional and Structural Motifs 

Translations of the nucleotide sequence of the provided polynucleotides, cDNAs or full genes 
can be aligned with individual known sequences. Similarity with individual sequences can be used to 
determine the activity of the polypeptides encoded by the polynucleotides of the invention. Also, 
sequences exhibiting similarity with more than one individual sequence can exhibit activities that are 
characteristic of either or both individual sequences. 

The full length sequences and fragments of the polynucleotide sequences of the nearest 
neighbors as identified through, for example, BLAST-based searching,can be used as probes and 
primers to identify and isolate the full length sequence corresponding to provided polynucleotides. 
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The nearest neighbors can indicate a tissue or cell type to be used to construct a library for the full- 
length sequences corresponding to the provided polynucleotides. 

Typically, a selected polynucleotide is translated in all six frames to determine the best 
alignment with the individual sequences. The sequences disclosed herein in the Sequence Listing are 
in a 5' to 3 5 orientation and translation in three frames can be sufficient (with a few specific 
exceptions as described in the Examples). These amino acid sequences are referred to, generally, as 
query sequences, which will be aligned with the individual sequences. Databases with individual 
sequences are described in "Computer Methods for Macromolecular Sequence Analysis" Methods in 
Enzymology (1996) 266, Doolittle, Academic Press, Inc., a division of Harcourt Brace & Co., San 
Diego, California, USA. Databases include GenBank, EMBL, and DNA Database of Japan (DDBJ). 

Query and individual sequences can be aligned using the methods and computer programs 
described above, and include BLAST 2.0, available over the world wide web at a site supported by the 
National Center for Biotechnology Information,, which is supported by the National Library of 
Medicine and the National Institutes of Health, or TeraBLAST available from TimeLogic Corp. 
(Crystal Bay, Nevada). See also Altschul, et al. Nucleic Acids Res. (1997) 25:3389-3402. Another 
alignment algorithm is Fasta, available in the Genetics Computing Group (GCG) package, Madison, 
Wisconsin, USA, a wholly owned subsidiary of Oxford Molecular Group, Inc. Other techniques for 
alignment are described in Doolittle, supra. Preferably, an alignment program that permits gaps in the 
sequence is utilized to align the sequences. The Smith-Waterman is one type of algorithm that permits 
gaps in sequence alignments. See Meth. Mol. Biol. (1997) 70: 173-187. Also, the GAP program 
using the Needleman and Wunsch alignment method can be utilized to align sequences. An 
alternative search strategy uses MPSRCH software, which runs on a MASPAR computer. MPSRCH 
uses a Smith-Waterman algorithm to score sequences on a massively parallel computer. This 
approach improves ability to identify sequences that are distantly related matches, and is especially 
tolerant of small gaps and nucleotide sequence errors. Amino acid sequences encoded by the provided 
polynucleotides can be used to search both protein and DNA databases. Incorporated herein by 
reference are all sequences that have been made public as of the filing date of this application by any 
of the DNA or protein sequence databases, including the patent databases (e.g., GeneSeq). Also 
incorporated by reference are those sequences that have been submitted to these databases as of the 
filing date of the present application but not made public until after the filing date of the present 
application. 

Results of individual and query sequence alignments can be divided into three categories: 
high similarity, weak similarity, and no similarity. Individual alignment results ranging from high 
similarity to weak similarity provide a basis for detennining polypeptide activity and/or structure. 
Parameters for categorizing individual results include: percentage of the alignment region length 
where the strongest alignment is found, percent sequence identity, and p value. The percentage of the 
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alignment region length is calculated by counting the number of residues of the individual sequence 
found in the region of strongest alignment, e.g., contiguous region of the individual sequence that 
contains the greatest number of residues that are identical to the residues of the corresponding region 
of the aligned query sequence. This number is divided by the total residue length of the query 

5 sequence to calculate a percentage. For example, a query sequence of 20 amino acid residues might 
be aligned with a 20 amino acid region of an individual sequence. The individual sequence might be 
identical to amino acid residues 5, 9-15, and 17-19 of the query sequence. Hie region of strongest 
alignment is thus the region stretching from residue 9-19, an 1 1 amino acid stretch. The percentage of 
the alignment region lengdi is: 1 1 (length of the region of strongest alignment) divided by (query 

10 sequence length) 20 or 55%. 

Percent sequence identity is calculated by counting the number of amino acid matches 
between the query and individual sequence and dividing total number of matches by the number of 
residues of the individual sequences found in the region of strongest alignment. Thus, the percent . 
identity in the example above would be 10 matches divided by 1 1 amino acids, or approximately, 

15 90.9% 

P value is the probability that the alignment was produced by chance. For a single alignment, 
the p value can be calculated according to Karlin et al., Proc. Natl. Acad. Sci. (1990) 87:2264 and 
Karlin et al., Proc. Natl. Acad. Sci. (1993) 90. The p value of multiple alignments using the same 
query sequence can be calculated using an heuristic approach described in Altschul et al., Nat. Genet. 

20 (1994) 6: 1 19. Alignment programs, such as BLAST or TeraBLAST, can calculate the p value. See 
also Altschul et al., Nucleic Acids Res. (1997) 25:33 89-3402. 

Another factor to consider for determining identity or similarity is the location of the 
similarity or identity. Strong local alignment can indicate similarity even if the length of alignment is 
short. Sequence identity scattered throughout the length of the query sequence also can indicate a 

25 similarity between the query and profile sequences. The boundaries of the region where the sequences 
align can be determined according to Doolittle, supra; BLAST 2.0 (see, e.g., Altschul, et al. Nucleic 
Acids Res. (1997) 25:3389-3402), TeraBLAST (available from TimeLogic Corp., Crystal Bay, 
Nevada), or FAST programs; or by determining the area where sequence identity is highest. 

High Similarity. In general, in alignment results considered to be of high similarity, the 

3 0 percent of the alignment region length is typically at least about 5 5 % of total length query sequence; 
more typically, at least about 58%; even more typically; at least about 60% of the total residue length 
of the query sequence. Usually, percent length of the alignment region can be as much as about 62%; 
more usually, as much as about 64%; even more usually, as much as about 66%. Further, for high 
similarity, the region of alignment, typically, exhibits at least about 75% of sequence identity, more 

3 5 typically, at least about 78%; even more typically, at least about 80% sequence identity. Usually, 
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percent sequence identity can be as much as about 82%; more usually, as much as about 84%; even 
more usually, as much as about 86%. 

The p value is used in conjunction with these methods. If high similarity is found, the query 
sequence is considered to have high similarity with a profile sequence when the p value is less than or 
equal to about 10e-2; more usually; less than or equal to about 10e-3; even more usually; less than or 
equal to about 10e-4. More typically, the p value is no more than about 10e-5; more typically, no 
more than or equal to about 10e-10; even more typically, no more than or equal to about lOe- 15 for 
the query sequence to be considered high similarity. 

Weak Similarity. In general, where alignment results considered to be of weak similarity, 
there is no minimum percent length of the alignment region nor minimum length of alignment. A 
better showing of weak similarity is considered when the region of alignment is, typically, at least 
about 15 ammo acid residues in length; more typically, at least about 20; even more typically, at least 
about 25 amino acid residues in length. Usually, length of the alignment region can be as much as 
about 30 amino acid residues; more usually, as much as about 40; even more usually, as much as 
about 60 amino acid residues. Further, for weak similarity, the region of alignment, typically, exhibits 
at least about 35% of sequence identity; more typically, at least about 40%; even more typically, at 
least about 45% sequence identity. Usually, percent sequence identity can be as much as about 50%; 
more usually, as much as about 55%; even more usually, as much as about 60%. 

If low similarity is found, the query sequence is considered to have weak similarity with a 
profile sequence when the p value is usually less than or equal to about 10e-2; more usually, less than 
or equal to about 10e-3; even more usually; less than or equal to about 10e-4. More typically, the p 
value is no more than about 10e-5; more usually, no more than or equal to about 10e-10; even more 
usually, no more than or equal to about 10e-15 for the query sequence to be considered weak 
similarity. 

Similarity Determined bv Sequence Identity Alone . Sequence identity alone can be used to 
determine similarity of a query sequence to an individual sequence and can indicate the activity of the 
sequence. Such an alignment, preferably, permits gaps to align sequences. Typically, the query 
sequence is related to the profile sequence if the sequence identity over the entire query sequence is at 
least about 15%; more typically, at least about 20%; even more typically, at least about 25%; even 
more typically, at least about 50%. Sequence identity alone as a measure of similarity is most useful 
when the query sequence is usually, at least 80 residues in length; more usually, at least 90 residues in 
length; even more usually, at least 95 amino acid residues in length. More typically, similarity can be 
concluded based on sequence identity alone when the query sequence is preferably 100 residues in 
length; more preferably, 120 residues in length; even more preferably, 150 amino acid residues in 
length. 
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Alignments with Profile and Multiple Aliened Sequences. Translations of the provided 
polynucleotides can be aligned with amino acid profiles that define either protein families or common 
motifs. Also, translations of the provided polynucleotides can be aligned to multiple sequence 
alignments (MSA) comprising the polypeptide sequences of members of protein families or motifs. 
Similarity or identity with profile sequences or MSAs can be used to determine the activity of the gene 
products (e.g., polypeptides) encoded by the provided polynucleotides or corresponding cDNA or 
genes. For example, sequences that show an identity or similarity with a chemokine profile or MSA 
can exhibit chemokine activities. 

Profiles can be designed manually by (1) creating an MSA, which is an alignment of the 
amino acid sequence of members that belong to the family and (2) constructing a statistical 
representation of the alignment. Such methods are described, for example, in Birney et aL, Nucl. Acid 
Res. (1996) 24(14): 2730-2739. MSAs of some protein families and motifs are publicly available. 
For example, the Genome Sequencing Center at thw Washington University School of Medicine 
provides a web set (Pfam) which provides MSAs of 547 different families and motifs. These MSAs 
are described also in Sonnhammer et aL, Proteins (1997) 28: 405-420. Other sources over the world 
wide web include the site supported by the European Molecular Biology Laboratories in Heidelberg, 
Germany. A brief description of these MSAs is reported in Pascarella et aL, Prot. Eng. (1996) 
9(3):249-25 1 . Techniques for building profiles from MSAs are described in Sonnhammer et aL, supra; 
Birney et aL, supra; and "Computer Methods for Macromolecular Sequence Analysis," Methods in 
Enzymology (1996) 266, Doolittle, Academic Press, Inc., San Diego, California, USA. 

Similarity between a query sequence and a protein family or motif can be determined by (a) 
comparing the query sequence against the profile and/or (b) aligning the query sequence with the 
members of the family or motif. Typically, a program such as Searchwise is used to compare the 
query sequence to the statistical representation of the multiple alignment, also known as a profile (see 
Birney et aL, supra). Other techniques to compare the sequence and profile are described in 
Sonnhammer et aL, supra and Doolittle, supra. 

Next, methods described by Feng et aL, J. Mol. Evol. (1987) 25:351 and Higgins et al, 
C ABIO S(1989)5:151 canbe used align the query sequence with the members of a family or motif, 
also known as a MSA. Sequence alignments can be generated using any of a variety of software tools. 
Examples include PileUp, which creates a multiple sequence alignment, and is described in Feng et 
aL, J. Mol. Evol. (1987) 25:351. Another method, GAP, uses the alignment method of Needleman et 
aL, J. Mol. Biol. (1970) 48:443. GAP is best suited for global alignment of sequences. A third 
method, BestFit, functions by inserting gaps to maximize the number of matches using the local 
homology algorithm of Smith et aL, Adv. Appl. Math. (1981) 2:482. In general, the following factors 
are used to determine if a similarity between a query sequence and a profile or MSA exists: (1) 
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number of conserved residues found in the query sequence, (2) percentage of conserved residues 
found in the query sequence, (3) number of frameshifts, and (4) spacing between conserved residues. 

Some alignment programs that both translate and align sequences can make any number of 
frameshifts when translating the nucleotide sequence to produce the best alignment. The fewer 
5 frameshifts needed to produce an alignment, the stronger the similarity or identity between the query 
and profile or MSAs. For example, a weak similarity resulting from no frameshifts can be a better 
indication of activity or structure of a query sequence, than a strong similarity resulting from two 
frameshifts. Preferably, three or fewer frameshifts are found in an alignment; more preferably two or 
fewer frameshifts; even more preferably, one or fewer frameshifts; even more preferably, no 
10 frameshifts are found in an alignment of query and profile or MSAs. 

Conserved residues are those amino acids found at a particular position in all or some of the 
family or motif members. Alternatively, a position is considered conserved if only a certain class of 
amino acids is found in a particular position in all or some of the family members. For example, the 
N-terminal position can contain a positively charged amino acid, such as lysine, arginine, or histidine. 
1 5 Typically, a residue of a polypeptide is conserved when a class of amino acids or a single 

amino acid is found at a particular position in at least about 40% of all class members; more typically, 
at least about 50%; even more typically, at least about 60% of the members. Usually, a residue is 
conserved when a class or single amino acid is found in at least about 70% of the members of a family 
or motif; more usually, at least about 80%; even more usually, at least about 90%; even more usually, 
20 at least about 95%. 

A residue is considered conserved when three unrelated amino acids are found at a particular 
position in some or all of the members; more usually, two unrelated amino acids. These residues are 
conserved when the unrelated amino acids are found at particular positions in at least about 40% of all 
class member; more typically, at least about 50%; even more typically, at least about 60% of the 
25 members. Usually, a residue is conserved when a class or single amino acid is found in at least about 
70% of the members of a family or motif; more usually, at least about 80%; even more usually, at least 
about 90%; even more usually, at least about 95%. 

A query sequence has similarity to a profile or MSA when the query sequence comprises at 
least about 25% of the conserved residues of the profile or MSA; more usually, at least about 30%; 
30 even more usually; at least about 40%. Typically, the query sequence has a stronger similarity to a 
profile sequence or MSA when the query sequence comprises at least about 45% of the conserved 
residues of the profile or MSA; more typically, at least about 50%; even more typically, at least about 
55%. 

Identification of Secreted & Membrane-Bound Polypeptides. Both secreted and membrane- 
3 5 bound polypeptides of the present invention are of particular interest. For example, levels of secreted 
polypeptides can be assayed in body fluids that are convenient, such as blood, plasma, serum, and 
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other body fluids such as urine, prostatic fluid and semen. Membrane-bound polypeptides are useful 
for constructing vaccine antigens or inducing an immune response. Such antigens would comprise all 
or part of the extracellular region of the membrane-bound polypeptides. Because both secreted and 
membrane-bound polypeptides comprise a fragment of contiguous hydrophobic amino acids, 

5 hydrophobicity predicting algorithms can be used to identify such polypeptides. 

A signal sequence is usually encoded by both secreted and membrane-bound polypeptide 
genes to direct a polypeptide to the surface of the cell. The signal sequence usually comprises a 
stretch of hydrophobic residues. Such signal sequences can fold into helical structures. Membrane- 
bound polypeptides typically comprise at least one transmembrane region that possesses a stretch of 

10 hydrophobic amino acids that can transverse the membrane. Some transmembrane regions also 
exhibit a helical structure. Hydrophobic fragments within a polypeptide can be identified by using 
computer algorithms. Such algorithms include Hopp & Woods, Proc. Natl. Acad. Sci. USA (1981) 
78:3824-3828; Kyte & Doolittle, J. Mol. Biol. (1982) 157: 105-132; and RAOAR algorithm, Degli 
Esposti et al, Eur. J. Biochem. (1990) 190: 207-219. 

1 5 Another method of identifying secreted and membrane-bound polypeptides is to translate the 

polynucleotides of the invention in all six frames and determine if at least 8 contiguous hydrophobic 
amino acids arc present. Those translated polypeptides with at least 8; more typically, 10; even more 
typically, 12 contiguous hydrophobic amino acids are considered to be either a putative secreted or 
membrane bound polypeptide. Hydrophobic amino acids include alanine, glycine, histidine, 

20 isoleucine, leucine, lysine, methionine, phenylalanine, proline, threonine, tryptophan, tyrosine, and 
valine 

Identification of the Function of an Expression Product of a Full-Length Gene 

Ribozymes, antisense constructs, and dominant negative mutants can be used to determine 
function of the expression product of a gene corresponding to a polynucleotide provided herein. 

25 These methods and compositions are particularly useful where the provided novel polynucleotide 
exhibits no significant or substantial homology to a sequence encoding a gene of known function. 
Antisense molecules and ribozymes can be constructed from synthetic polynucleotides. Typically, the 
phosphoramidite method of oligonucleotide synthesis is used. See Beaucage et al., Tet. Lett. (1981) 
22: 1 859 and USPN 4,668,777. Automated devices for synthesis are available to create 

30 oligonucleotides using this chemistry. Examples of such devices include Biosearch 8600, Models 392 
and 394 by Applied Biosystems, a division of Peikin-Elmer Corp., Foster City, California, USA; and 
Expedite by Perceptive Biosystems, Framingham, Massachusetts, USA. Synthetic RNA, phosphate 
analog oligonucleotides, and chemically derivatized oligonucleotides can also be produced, and can be 
covalently attached to other molecules. RNA oligonucleotides can be synthesized, for example, using 

3 5 RNA phosphoramidites. This method can be performed on an automated synthesizer, such as Applied 
Biosystems, Models 392 and 394, Foster City, California, USA. 
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Phosphorothioate oligonucleotides can also be synthesized for antisense construction. A 
sulfurizing reagent, such as tetraethylthiruam disulfide (TETD) in acetonitrile can be used to convert 
the internucleotide cyanoethyl phosphite to the phosphorothioate triester within 15 minutes at room 
temperature. TETD replaces the iodine reagent, while all other reagents used for standard 
5 phosphoramidite chemistry remain the same. Such a synthesis method can be automated using 
Models 392 and 394 by Applied Biosystems, for example. 

Oligonucleotides of up to 200 nt can be synthesized, more typically, 100 nt; more typically 50 
nt; even more typically, 30 to 40 nt. These synthetic fragments can be annealed and ligated together to 
construct larger fragments. See, for example, Sambrook et al., supra. Trans-cleaving catalytic RNAs 
10 (ribozymes) are RNA molecules possessing endoribonuclease activity. Ribozymes are specifically 
designed for a particular target, and die target message must contain a specific nucleotide sequence. 
They are engineered to cleave any RNA species site-specifically in the background of cellular RNA. 
The cleavage event renders die mRNA unstable and prevents protein expression. Importantly, 
ribozymes can be used to inhibit expression of a gene of unknown function for the purpose of 
1 5 determining its function in an in vitro or in vivo context, by detecting the phenotypic effect. One 

commonly used ribozyme motif is the hammerhead, for which the substrate sequence requirements are 
minimal. Design of die hammerhead ribozyme, as well as therapeutic uses of ribozymes, are disclosed 
in Usman et al., Current Opin. Struct. Biol. (1996) 6:527. Methods for production of ribozymes, 
including hairpin structure ribozyme fragments, methods of increasing ribozyme specificity, and die 
20 like are known in die art. 

The hybridizing region of the ribozyme can be modified or can be prepared as a branched 
structure as described in Horn and Urdea, Nucleic Acids Res. (1989) 17:6959. The basic structure of 
the ribozymes can also be chemically altered in ways familiar to those skilled in die art, and 
chemically syndiesized ribozymes can be administered as synthetic oligonucleotide derivatives 
25 modified by monomeric units. In a therapeutic context, liposome mediated delivery of ribozymes 
improves cellular uptake, as described in Birikh et al., Eur. J. Biochem. (1997) 245.T. 

Antisense nucleic acids are designed to specifically bind to RNA, resulting in the formation of 
RNA-DNA or RNA-RNA hybrids, with an arrest of DNA replication, reverse transcription or 
messenger RNA translation. Antisense polynucleotides based on a selected polynucleotide sequence 
30 can interfere widi expression of the corresponding gene. Antisense polynucleotides are typically 

generated within die cell by expression from antisense constructs that contain the antisense strand as 
the transcribed strand. Antisense polynucleotides based on die disclosed polynucleotides will bind 
and/or interfere with the translation of mRNA comprising a sequence complementary to the antisense 
polynucleotide. The expression products of control cells and cells treated with the antisense construct 
35 are compared to detect the protein product of the gene corresponding to the polynucleotide upon 
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which the antisense construct is based. The protein is isolated and identified using routine 
biochemical methods. 

Given the extensive background literature and clinical experience in antisense therapy, one 
skilled in the art can use selected polynucleotides of the invention as additional potential therapeutics. 

5 The choice of polynucleotide can be narrowed by first testing them for binding to "hot spot" regions 
of the genome of cancerous cells. If a polynucleotide is identified as binding to a "hot spot," testing 
the polynucleotide as an antisense compound in the corresponding cancer cells is warranted. 

As an alternative method for identifying function of the gene corresponding to a 
polynucleotide disclosed herein, dominant negative mutations are readily generated for corresponding 

10 proteins that are active as homomultimers. A mutant polypeptide will interact with wild-type 

polypeptides (made from the other allele) and form a non-functional multimer. Thus, a mutation is in 
a substrate-binding domain, a catalytic domain, or a cellular localization domain. Preferably, the 
mutant polypeptide will be overproduced. Point mutations are made that have such an effect. In 
addition, fusion of different polypeptides of various lengths to the terminus of a protein can yield 

1 5 dominant negative mutants. General strategies are available for making dominant negative mutants 

(see, e.g., Herskowitz, Nature (1987) 329:219). Such techniques can be used to create loss of function 
mutations, which are useful for determining protein function. 
Polypeptides and Variants Thereof 

Hie polypeptides of the invention include those encoded by the disclosed polynucleotides, as 
20 well as nucleic acids that, by virtue of the degeneracy of the genetic code, are not identical in sequence 
to the disclosed polynucleotides. Thus, the invention includes within its scope a polypeptide encoded 
by a polynucleotide having the sequence of any one of SEQ ID NOS: 1-1485 or a variant thereof. Also 
included in the invention are the polypeptides comprising the amino acid sequences of SEQ ID 
NOS: 1486-1 542. 

25 In general, the term "polypeptide" as used herein refers to both the full length polypeptide 

encoded by the recited polynucleotide, the polypeptide encoded by the gene represented by the recited 
polynucleotide, as well as portions or fragments thereof. "Polypeptides" also includes variants of the 
naturally occurring proteins, where such variants are homologous or substantially similar to the 
naturally occurring protein, and can be of an origin of the same or different species as the naturally 

30 occurring protein (e.g., human, murine, or some other species that naturally expresses the recited 

polypeptide, usually a mammalian species). In general, variant polypeptides have a sequence that has 
at least about 80%, usually at least about 90%, and more usually at least about 98% sequence identity 
with a differentially expressed polypeptide of the invention, as measured by BLAST 2.0 or 
TeraBLAST using the parameters described above. The variant polypeptides can be naturally or non- 

3 5 naturally glycosylated, i.e., the polypeptide has a glycosylation pattern that differs from the 
glycosylation pattern found in the corresponding naturally occurring protein. 
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Hie invention also encompasses homologs of the disclosed polypeptides (or fragments 
thereof) where the homologs are isolated from other species, i.e. other animal or plant species, where 
such homologs, usually mammalian species, e.g. rodents, such as mice, rats; domestic animals, e.g., 
horse, cow, dog, cat; and humans. By "homolog" is meant a polypeptide having at least about 35%, 
5 usually at least about 40% and more usually at least about 60% amino acid sequence identity to a 
particular differentially expressed protein as identified above, where sequence identity is determined 
using die BLAST 2.0 or TeraBLAST algorithm,- with the parameters described supra. 

hi general, the polypeptides of the subject invention are provided in a non-naturally occurring 
environment, e.g. are separated from dieir naturally occurring environment. In certain embodiments, 

1 0 the subject protein is present in a composition that is enriched for the protein as compared to a control. 
As such, purified polypeptide is provided, where by purified is meant that the protein is present in a 
composition that is substantially free of non-differentially expressed polypeptides, where by 
substantially free is meant that less than 90%, usually less than 60% and more usually less than 50% 
of Hie composition is made up of non-differentially expressed polypeptides. 

1 5 Also within the scope of the invention are variants; variants of polypeptides include mutants, 

fragments, and fusions. Mutants can include amino acid substitutions, additions or deletions. The 
amino acid substitutions can be conservative amino acid substitutions or substitutions to eliminate 
non-essential amino acids, such as to alter a glycosylation site, a phosphorylation site or an acetylation 
site, or to minimize misfolding by substitution or deletion of one or more cysteine residues that are not 

20 necessary for function. Conservative amino acid substitutions are those that preserve the general 

charge, hydrophobicity/ hydrophilicity, and/or steric bulk of the amino acid substituted. Variants can 
be designed so as to retain or have enhanced biological activity of a particular region of the protein 
(e.g., a functional domain and/or, where die polypeptide is a member of a protein family, a region 
associated witii a consensus sequence). Selection of ammo acid alterations for production of variants 

25 can be based upon the accessibility (interior vs. exterior) of the amino acid (see, e.g., Go et al, Int. J. 
Peptide Protein Res. (1980) 15:21 1), the thermostability of the variant polypeptide (see, e.g., Querol et 
al., Prot. Eng. (1996) 9:265), desired glycosylation sites (see, e.g., Olsen and Thomsen, J. Gen. 
Microbiol. (1991) 137:579), desired disulfide bridges (see, e.g., Clarke et al., Biochemistry (1993) 
32:4322; and Wakarchuk et al., Protein Eng. (1994) 7:1379), desired metal binding sites (see, e.g., 

30 Toma et al., Biochemistry (199 1) 30:97, and Haezerbrouck et al., Protein Eng. (1993) 6:643), and 
desired substitutions within proline loops (see, e.g., Masul et al., Appl. Env. Microbiol. (1994) 
60:3579). Cysteine-depleted muteins can be produced as disclosed in USPN 4,959,3 14. 

Variants also include fragments of the polypeptides disclosed herein, particularly haptens, 
biologically active fragments, and/or fragments corresponding to functional domains. Fragments of 

35 interest will typically be at least about 10 aa to at least about 15 aa in length, usually at least about 50 
aa in length, and can be as long as 3 00 aa in length or longer, but will usually not exceed about 1 000 
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aa in length, where the fragment will have a stretch of amino acids that is identical to a polypeptide 
encoded by a polynucleotide having a sequence of any SEQ ID NOS: 1-1485, a polypeptide 
comprising a sequence of at least one of SEQ ID NOS: 1486-1 542, or a homolog thereof. The protein 
variants described herein are encoded by polynucleotides that are within the scope of the invention. 
5 The genetic code can be used to select the appropriate codons to construct the corresponding variants. 
A fragment of a subject polypeptide is, for example, a polypeptide 
having an amino acid sequence which is a portion of a subject polypeptide e.g. a polypeptide encoded 
by a subject polynucleotide that is identified by any one of the sequence the sequence listing or its 
complement. The polypeptide fragments of the invention are preferably at least about 9 aa, at least 

1 0 about 1 5 aa, and more preferably at least about 20 aa, still more preferably at least about 3 0 aa, and 
even more preferably, at least about 40 aa, at least about 50 aa, at least about 75 aa, at least about 100 
aa, at least about 125 aa or at least about 150 aa in length. A fragment "at least 20 aa in length," for 
example, is intended to include 20 or more contiguous ammo acids from, for example, the polypeptide 
encoded by a cDNA, in a cDNA clone contained in a deposited library, or a nucleotide sequence 

1 5 shown in the sequence listing or the complementary stand thereof. In this context "about" includes the 
particularly recited value or a value larger or smaller by several (5, 4, 3, 2, or 1) amino acids. These 
polypeptide fragments have uses that include, but are not limited to, production of antibodies as 
discussed herein. Of course, larger fragments (e.g., at least 150, 175, 200, 250, 500, 600, 1000, or 
2000 amino acids in length) are also encompassed by the invention. 

20 Moreover, representative examples of polypeptides fragments of the invention (useful in, for 

example, as antigens for antibody production), include, for example, fragments comprising, or 
alternatively consisting of, a sequence from about amino acid number 1-10, 5-10, 10-20, 21-31, 3 1-40, 
41-61, 61-81, 91-120, 121-140, 141-162, 162-200, 201-240, 241-280, 281- 320, 321-360, 360-400, 
400-450, 451-500, 500-600, 600-700, 700-800, 800-900 and the like. In this context "about" includes 

25 the particularly recited range or a range larger or smaller by several (5, 4, 3, 2, or 1) amino acids, at 
either terminus or at both termini. In some embodiments, these fragments has a functional activity 
(e.g., biological activity) whereas in other embodiments, these fragments may be used to make an 
antibody. 

Further polypeptide variants may are described in PCT publications WO/00-55173, WO/01- 
30 07611 and WO/02-16429 

Computer-Related Embodiments 

In general, a library of polynucleotides is a collection of sequence information, which 
information is provided in either biochemical form (e.g., as a collection of polynucleotide molecules), 
or in electronic form (e.g., as a collection of polynucleotide sequences stored in a computer-readable 
35 form, as in a computer system and/or as part of a computer program). The sequence information of 
the polynucleotides can be used in a variety of ways, e.g,, as a resource for gene discovery, as a 
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representation of sequences expressed in a selected cell type (e.g., cell type markers), and/or as 
markers of a given disease or disease state. In general, a disease marker is a representation of a gene 
product that is present in all cells affected by disease either at an increased or decreased level relative 
to a normal cell (e.g., a cell of the same or similar type that is not substantially affected by disease). 
5 For example, a polynucleotide sequence in a library can be a polynucleotide that represents an mRNA, 
polypeptide, or other gene product encoded by the polynucleotide, that is either overexpressed or 
underexpressed in a breast ductal cell affected by cancer relative to a normal (i.e., substantially 
disease-free) breast cell. 

The nucleotide sequence information of the library can be embodied in any suitable form, e.g., 

1 0 electronic or biochemical forms. For example, a library of sequence information embodied in 

electronic form comprises an accessible computer data file (or, in biochemical form, a collection of 
nucleic acid molecules) that contains the representative nucleotide sequences of genes that are 
differentially expressed (e.g., overexpressed or underexpressed) as between, for example, i) a 
cancerous cell and a normal cell; ii) a cancerous cell and a dysplastic cell; iii) a cancerous cell and a 

15 cell affected by a disease or condition other than cancer; iv) a metastatic cancerous cell and a normal 
cell and/or non-metastatic cancerous cell; v) a malignant cancerous cell and a non-malignant 
cancerous cell (or a normal cell) and/or vi) a dysplastic cell relative to a normal cell. Other 
combinations and comparisons of cells affected by various diseases or stages of disease will be readily 
apparent to the ordinarily skilled artisan. Biochemical embodiments of the library include a collection 

20 of nucleic acids that have the sequences of the genes in the library, where the nucleic acids can 
correspond to the entire gene in the library or to a fragment thereof, as described in greater detail 
below. 

The polynucleotide libraries of the subject invention generally comprise sequence information 
of a plurality of polynucleotide sequences, where at least one of the polynucleotides has a sequence of 

25 any of SEQ ID NOS:l-1485. By plurality is meant at least 2, usually at least 3 and can include up to 
all of SEQ ID NOS:l-1485. The length and number of polynucleotides in the library will vary with 
the nature of the library, e.g., if the library is an oligonucleotide array, a cDNA array, a computer 
database of the sequence information, etc. 

Where the library is an electronic library, the nucleic acid sequence information can be 

3 0 present in a variety of media. "Media" refers to a manufacture, other than an isolated nucleic acid 
molecule, that contains the sequence information of the present invention. Such a manufacture 
provides the genome sequence or a subset thereof in a form that can be examined by means not 
directly applicable to the sequence as it exists in a nucleic acid. For example, the nucleotide sequence 
of the present invention, e.g. the nucleic acid sequences of any of the polynucleotides of SEQ ID 

3 5 NOS: 1-1485, can be recorded on computer readable media, e.g. any medium that can be read and 
accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, 
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such as a floppy disc, a hard disc storage medium, and a magnetic tape; optical storage media such as 
CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as 
magnetic/optical storage media. One of skill in the art can readily appreciate how any of the presently 
known computer readable mediums can be used to create a manufacture comprising a recording of the 
5 present sequence information. "Recorded" refers to a process for storing information on computer 
readable medium, using any such methods as known in the art. Any convenient data storage structure 
can be chosen, based on the means used to access the stored information. A variety of data processor 
programs and formats can be used for storage, e.g. word processing text file, database format, etc. In 
addition to the sequence information, electronic versions of the libraries of the invention can be 
1 0 provided in conjunction or connection with other computer-readable information and/ or other types of 
computer-readable files (e.g., searchable files, executable files, etc, including, but not limited to, for 
example, search program software, etc.). 

By providing the nucleotide sequence in computer readable form, the information can be 
accessed for a variety of purposes. Computer software to access sequence information is publicly 
15 available. For example, the gapped BLAST (Altschul et al. Nucleic Acids Res. (1997) 25:3389-3402) 
and BLAZE (Brutlag et al. Comp. Chem. (1993) 17:203) search algorithms on a Sybase system, or the 
TeraBLAST (TimeLogic, Crystal Bay, Nevada) program optionally running on a specialized computer 
platform available from TimeLogic, can be used to identify open reading frames (ORFs) within the 
genome that contain homology to ORFs from other organisms. 
20 As used herein, "a computer-based system" refers to the hardware means, software means, and 

data storage means used to analyze the nucleotide sequence information of the present invention. The 
minimum hardware of the computer-based systems of the present invention comprises a central 
processing unit (CPU), input means, output means, and data storage means. A skilled artisan can 
readily appreciate that any one of the currently available computer-based system are suitable for use in 
25 the present invention. The data storage means can comprise any manufacture comprising a recording 
of the present sequence information as described above, or a memory access means that can access 
such a manufacture. 

"Search means" refers to one or more programs implemented on the computer-based system, 
to compare a target sequence or target structural motif, or expression levels of a polynucleotide in a 

30 sample, with the stored sequence information. Search means can be used to identify fragments or 
regions of the genome that match a particular target sequence or target motif. A variety of known 
algorithms are publicly known and commercially available, e.g. MacPattern (EMBL), BLASTN and 
BLASTX (NCBI), TeraBLAST (TimeLogic, Crystal Bay, Nevada). A "target sequence" can be any 
polynucleotide or amino acid sequence of six or more contiguous nucleotides or two or more amino 

35 acids, preferably from about 10 to 100 amino acids or from about 30 to 300 nt A variety of comparing 
means can be used to accomplish comparison of sequence information from a sample (e.g., to analyze 
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target sequences, target motifs, or relative expression levels) with the data storage means. A skilled 
artisan can readily recognize that any one of the publicly available homology search programs can be 
used as the search means for the computer based systems of the present invention to accomplish 
comparison of target sequences and motifs. Computer programs to analyze expression levels in a 
5 sample and in controls are also known in the art. 

A "target structural motif," or "target motif," refers to any rationally selected sequence or 
combination of sequences in which the sequence(s) are chosen based on a three-dimensional 
configuration that is formed upon the folding of the target motif, or on consensus sequences of 
regulatory or active sites. There are a variety of target motifs known in the art. Protein target motifs 

1 0 include, but arc not limited to, enzyme active sites and signal sequences. Nucleic acid target motifs 
include, but are not limited to, hairpin structures, promoter sequences and other expression elements 
such as binding sites for transcription factors. 

A variety of structural formats for the input and output means can be used to input and output 
the information in the computer-based systems of the present invention. One format for an output 

15 means ranks the relative expression levels of different polynucleotides. Such presentation provides a 
skilled artisan with a ranking of relative expression levels to determine a gene expression profile. 

As discussed above, the "library" of the invention also encompasses biochemical libraries of 
the polynucleotides of SEQ ID NOS: 1-1485 , e.g., collections of nucleic acids representing the 
provided polynucleotides. The biochemical libraries can take a variety of forms, e.g., a solution of 

20 cDNAs, a pattern of probe nucleic acids stably associated with a surface of a solid support (i.e., an 
array) and the like. Of particular interest are nucleic acid arrays in which one or more of SEQ ID 
NOS: 1-1485 is represented on the array. By array is meant a an article of manufacture that has at least 
a substrate with at least two distinct nucleic acid targets on one of its surfaces, where the number of 
distinct nucleic acids can be considerably higher, typically being at least 1 0, usually at least 20, and 

25 often at least 25 distinct nucleic acid molecules. A variety of different array formats have been 

developed and are known to those of skill in the art. The arrays of the subject invention find use hi a 
variety of applications, including gene expression analysis, drug screening, mutation analysis and the 
like, as disclosed in the above-listed exemplary patent documents. 

In addition to the above nucleic acid libraries, analogous libraries of polypeptides are also 

30 provided, where the polypeptides of the library will represent at least a portion of the polypeptides 
encoded by a gene corresponding to one or more of SEQ ID NOS-.1-1485. 
Utilities 

The polynucleotides of the invention are useful in a variety of applications. Exemplary utilies 
of the polynucleotides of the invention are described below. 
35 Construction of Larger Molecules: Recombinant DNAs and Nucleic Acid Multimers. In one 

embodiment of particular interest, the polynucleotides described herein as useful as the building 
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blocks for larger molecules. In one example, the polynucleotide is a component of a larger cDNA 
molecule which in turn can be adapted for expression in a host cell {e.g., a bacterial or eukaryotic 
{e.g., yeast or mammalian) host cell). The cDNA can include, in addition to the polypeptide encoded 
by the starting material polynucleotide {i.e., a polynucleotide described herein), an amino acid 
5 sequence that is heterologous to the polypeptide encoded by the polynucleotide described herein {e.g. , 
as in a sequence encoding a fusion protein). In some embodiments, the polynucleotides described 
herein is used as starting material polynucleotide for synthesizing all or a portion of the gene to which 
the described polynucleotide corresponds. For example, a DNA molecule encoding a full-length 
human polypeptide can be constructed using a polynucleotide described herein as starting material. 

10 In another embodiment, the polynucleotides of the invention are used in nucleic acid 

multimers. Nucleic acid multimers can be linear or branched polymers of the same repeating single- 
stranded oligonucleotide unit or different single-stranded oligonucleotide units. Where the molecules 
are branched, die multimers are generally described as either "fork" or "comb" structures. The 
oligonucleotide units of the multimer may be composed of RNA, DNA, modified nucleotides or 

15 combinations thereof. At least one of the units has a sequence, length, and composition that permits it 
to bind specifically to a first single-stranded nucleotide sequence of interest, typically analyte or an 
oligonucleotide bound to the analyte. In order to achieve such specificity and stability, this unit will 
normally be 15 to 50 nt, preferably 15 to 30 nt, in length and have a GC content in the range of 40% 
to 60%. In addition to such unit(s), the multimer includes a multiplicity of units that are capable of 

20 hybridizing specifically and stably to a second single-stranded nucleotide of interest, typically a 

labeled oligonucleotide or anotiier multimer. These units will also normally be 1 5 to 50 nt, preferably 
15 to 30 nt, in length and have a GC content in the range of 40% to 60%. When a multimer is 
designed to be hybridized to another multimer, the first and second oligonucleotide units are 
heterogeneous (different). One or more of the polynucleotides described herein, or a portion of a 

25 polynucleotide described herein, can be used as a repeating unit of such nucleic acid multimers. 

The total number of oligonucleotide units in the multimer will usually be in the range of 3 to 
50, more usually 10 to 20. In multimers in which the unit that hybridizes to the nucleotide sequence 
of interest is different from the unit that hybridizes to the labeled oligonucleotide, the number ratio of 
the latter to die former will usually be 2:1 to 30:1, more usually 5:1 to 20:1, and-preferably 10:1 to 

30 15:1. 

The oligonucleotide units of the multimer may be covalently linked directly to each otiier 
tiirough phosphodiester bonds or tiirough interposed linking agents such as nucleic acid, amino acid, 
carbohydrate or polyol bridges, or through other cross-linking agents that are capable of cross-linking 
nucleic acid or modified nucleic acid strands. The site(s) of linkage may be at the ends of the unit (in 
35 eitiier normal 3,-5' orientation or randomly oriented) and/or at one or more internal nucleotides in the 
strand. In linear multimers the individual units are linked end-to-end to form a linear polymer. In one 
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type of branched multimer three or more oligonucleotide units emanate from a point of origin to form 
a branched structure. The point of origin may be another oligonucleotide unit or a multifunctional 
molecule to which at least three units can be covalently bound. In another type, there is an 
oligonucleotide unit backbone with one or more pendant oligonucleotide units. These latter-type 
5 multimers are "fork-like", "comb-like" or combination "fork-" and "comb-like" in structure. The 
pendant units will normally depend from a modified nucleotide or other organic moiety having 
appropriate functional groups to which oligonucleotides may be conjugated or otherwise attached. 
The multimer may be totally linear, totally branched, or a combination of linear and branched 
portions. Preferably there will be at least two branch points in the multimer, more preferably at least 

10 3 , preferably 5 to 1 0. The multimer may include one or more segments of double-stranded sequences. 
Multimeric nucleic acid molecules are useful in amplifying the signal that results from 
hybridization of one the first sequence of the multimeric molecule to a target sequence. The 
amplification is theoretically proportional to the number of iterations of the second segment. 

Without being held to theory, forked structures of greater than about eight branches exhibited 

15 steric hindrance which inhibited binding of labeled probes to the multimer. On the other hand, comb 
structures exhibit little or no steric problems and are thus a preferred type of branched multimer. For a 
description of branched nucleic acid multimers of both the fork and comb types, as well as methods of 
use and synthesis, see, e.g., U.S. Pat. Nos. 5,124,246 (fork-type structures); 5,710,264 (synthesis of 
comb structures); and 5,849,481. 

20 Use of Polynucleotide Probes in Mapping, and in Tissue Profiling. Polynucleotide probes, 

generally comprising at least 12 contiguous nt of a polynucleotide as shown in the Sequence Listing, 
are used for a variety of purposes, such as chromosome mapping of the polynucleotide and detection 
of transcription levels. Additional disclosure about preferred regions of the disclosed polynucleotide 
sequences is found in the Examples. A probe that hybridizes specifically to a polynucleotide disclosed 

25 herein should provide a detection signal at least 5-, 1 0-, or 20-fold higher than the background 
hybridization provided with other unrelated sequences. 

Detection of Expression Levels. Nucleotide probes are used to detect expression of a gene 
corresponding to the provided polynucleotide. In Northern blots, mRNA is separated 
electrophoretically and contacted with a probe. A probe is detected as hybridizing to an mRNA 

30 species of a particular size. The amount of hybridization is quantitated to determine relative amounts 
of expression, for example under a particular condition. Probes are used for in situ hybridization to 
cells to detect expression. Probes can also be used in vivo for diagnostic detection of hybridizing 
sequences. Probes are typically labeled with a radioactive isotope. Other types of detectable labels 
can be used such as chromophores, fluors, and enzymes. Other examples of nucleotide hybridization 

3 5 assays are described in WO92/02526 and USPN 5,124,246. 
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Alternatively, the Polymerase Chain Reaction (PCR) is another means for detecting small 
amounts of target nucleic acids (see, e.g., Mullis et al., Meth. Enzymol. (1987) 155:335; USPN 
4,683,195; and USPN 4,683,202). Two primer polynucleotides nucleotides that hybridize with the 
target nucleic acids are used to prime the reaction. The primers can be composed of sequence within 
5 or 3' and 5' to the polynucleotides of the Sequence Listing. Alternatively, if the primers are 3' and 5' to 
these polynucleotides, they need not hybridize to them or the complements. After amplification of the 
target with a thermostable polymerase, the amplified target nucleic acids can be detected by methods 
known in the art, e.g., Southern blot. mRNA or cDNA can also be detected by traditional blotting 
techniques (e.g., Southern blot, Northern blot, etc.) described in Sambrook et al., "Molecular Cloning: 

10 A Laboratory Manual" (New York, Cold Spring Harbor Laboratory, 1989) (e.g., without PCR 

amplification). Li general, mRNA or cDNA generated from mRNA using a polymerase enzyme can 
be purified and separated using gel electrophoresis, and transferred to a solid support, such as 
nitrocellulose. The solid support is exposed to a labeled probe, washed to remove any unhybridized 
probe, and duplexes containing the labeled probe are detected. 

1 5 Mapping. Polynucleotides of the present invention can be used to identify a chromosome on 

which the corresponding gene resides. Such mapping can be useful in identifying the function of the 
polynucleotide-rclatcd gene by its proximity to other genes with known function. Function can also 
be assigned to the polynucleotide-related gene when particular syndromes or diseases map to the same 
chromosome. For example, use of polynucleotide probes in identification and quantification of 

20 nucleic acid sequence aberrations is described in USPN 5,783,3 87. An exemplary mapping method is 
fluorescence in situ hybridization (FISH), which facilitates comparative genomic hybridization to 
allow total genome assessment of changes in relative copy number of DNA sequences (see, e.g., 
Valdes et al., Methods in Molecular Biology (1997) 68:1). Polynucleotides can also be mapped to 
particular chromosomes using, for example, radiation hybrids or chromosome-specific hybrid panels. 

25 See Leach et al., Advances in Genetics, (1995) 33:63-99; Walter et al., Nature Genetics (1994) 7:22; 
Walter and Goodfellow, Trends in Genetics (1992) 9:352. Panels for radiation hybrid mapping are 
available from Research Genetics, Inc., Huntsville, Alabama, USA. Databases for markers using 
various panels are available via the world wide web at sites supported by the Stanford Human 
Genome Center (Stanford University) and the Whitehead Institute for Biomedical Research/MIT 

30 Center for Genome Research. The statistical program RHMAP can be used to construct a map based 
on the data from radiation hybridization with a measure of the relative likelihood of one order versus 
another. RHMAP is available via the world wide web at a site supported by the University of 
Michigan. In addition, commercial programs are available for identifying regions of chromosomes 
commonly associated with disease, such as cancer. 

3 5 Tissue Typing or Profiling. Expression of specific mRNA corresponding to the provided 

polynucleotides can vary in different cell types and can be tissue-specific. This variation of mRNA 
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levels in different cell types can be exploited with nucleic acid probe assays to determine tissue types. 
For example, PCR, branched DNA probe assays, or blotting techniques utilizing nucleic acid probes 
substantially identical or complementary to polynucleotides listed in the Sequence Listing can 
determine the presence or absence of the corresponding cDNA or mRNA. 
5 Tissue typing can be used to identify the developmental organ or tissue source of a metastatic 

lesion by identifying the expression of a particular marker of that organ or tissue. If a polynucleotide 
is expressed only in a specific tissue type, and a metastatic lesion is found to express that 
polynucleotide, then the developmental source of the lesion has been identified. Expression of a 
particular polynucleotide can be assayed by detection of either the corresponding mRNA or the 
1 0 protein product. As would be readily apparent to any forensic scientist, the sequences disclosed herein 
are useful in differentiating human tissue from non-human tissue. In particular, these sequences are 
useful to differentiate human tissue from bird, reptile, and amphibian tissue, for example. 

Use of Polymorphisms . A polynucleotide of the invention can be used in forensics, genetic 
analysis, mapping, and diagnostic applications where the corresponding region of a gene is 
1 5 polymorphic in the human population. Any means for detecting a polymorphism in a gene can be 
used, including, but not limited to electrophoresis of protein polymorphic variants, differential 
sensitivity to restriction en7yme cleavage, and hybridization to allele-specific probes. 

Antibody Production. Hie present invention further provides antibodies, which may be 
isolated antibodies, that are specific for a polypeptide encoded by a polynucleotide described herein 
20 (e.g., a polypeptide encoded by a sequence corresponding to SEQ ID NOS:l-1485, a polypeptide 
comprising an ammo acid sequence of SEQ ED NOS: 1486-1542). Antibodies can be provided in a 
composition comprising the antibody and a buffer and/or a pharmaceutically acceptable excipient. 
Antibodies specific for a polypeptide associated with prostate cancer are useful in a variety of 
diagnostic and therapeutic methods, as discussed in detail herein. 
25 Expression products of a polynucleotide of the invention, as well as the corresponding 

mRNA, cDNA, or complete gene, can be prepared and used for raising antibodies for experimental, 
diagnostic, and therapeutic purposes. For polynucleotides to which a corresponding gene has not been 
assigned, this provides an additional method of identifying the corresponding gene. The 
polynucleotide or related cDNA is expressed as described above, and antibodies are prepared. These 
30 antibodies are specific to an epitope on the polypeptide encoded by the polynucleotide, and can 

precipitate or bind to the corresponding native protein in a cell or tissue preparation or in a cell-free 
extract of an in vitro expression system. 

Methods for production of antibodies mat specifically bind a selected antigen are well known 
in the art. Immunogens for raising antibodies can be prepared by mixing a polypeptide encoded by a 
35 polynucleotide of the invention with an adjuvant, and/or by making fusion proteins with larger 
immunogenic proteins. Polypeptides can also be covalently linked to other larger immunogenic 
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proteins, such as keyhole limpet hemocyanin. Immunogens are typically administered intradermally, 
subcutaneously, or intramuscularly to experimental animals such as rabbits, sheep, and mice, to 
generate antibodies. Monoclonal antibodies can be generated by isolating spleen cells and fusing 
myeloma cells to form hybridomas. Alternatively, the selected polynucleotide is administered directly, 
5 such as by intramuscular injection, and expressed in vivo. The expressed protein generates a variety 
of protein-specific immune responses, including production of antibodies, comparable to 
administration of the protein. 

Preparations of polyclonal and monoclonal antibodies specific for polypeptides encoded by a 
selected polynucleotide are made using standard methods known in the art. The antibodies 

1 0 specifically bind to epitopes present in the polypeptides encoded by polynucleotides disclosed in the 
Sequence Listing. Typically, at least 6, 8, 10, or 12 contiguous amino acids are required to form an 
epitope. Epitopes that involve non-contiguous amino acids may require a longer polypeptide, e.g., at 
least 15, 25, or 50 amino acids. Antibodies that specifically bind to human polypeptides encoded by 
the provided polypeptides should provide a detection signal at least 5-, 10-, or 20-fold higher than a 

15 detection signal provided with other proteins when used in Western blots or other immunochemical 
assays. Preferably, antibodies that specifically bind polypeptides contemplated by the invention do 
not bind to other proteins in immunochemical assays at detectable levels and can immunoprecipitatc 
the specific polypeptide from solution. 

The invention also contemplates naturally occurring antibodies specific for a polypeptide of 

20 the invention. For example, serum antibodies to a polypeptide of the invention in a human population 
can be purified by methods well known in the art, e.g., by passing antiserum over a column to which 
the corresponding selected polypeptide or fusion protein is bound. The bound antibodies can then be 
eluted from the column, for example, using a buffer with a high salt concentration. 

In addition to the antibodies discussed above, the invention also contemplates genetically 

25 engineered antibodies antibodies (e.g., chimeric antibodies, humanized antibodies, human antibodies 
produced by a transgenic animal (e.g., a transgenic mouse such as the XenomousTM), antibody 
derivatives (e.g., single chain antibodies, antibody fragments (e.g., Fab, etc.)), according to methods 
well known in the art. 

The invention also contemplates other molecules that can specifically bind a polynucleotide or 
30 polypeptide of the invention. Examples of such molecules include, but are not necessarily limited to, 
single-chain binding proteins (e.g., mono- and multi-valent single chain antigen binding proteins (see, 
e.g., U.S. Patent Nos. 4,704,692; 4,946,778; 4,946,778; 6,027,725; 6,121,424)), oligonucleotide- 
based synthetic antibodies (e.g., oligobodies (see, e.g., Radrizzani etal, Medicina (B Aires) (1999) 
59:753-8; Radrizzani et al, Medicina (B Aires) (2000) 60(Suppl 2):55-60)), aptamers (see, e.g., 
35 Gening et al, Biotechniques (2001) 3:828, 830, 832, 834; Cox and Ellington, Bioorg. Med. Chem. 
(2001) 9:2525-31), and the like.- 
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Polynucleotides or Arrays for Diagnostics. 

Polynucleotide arrays provide a high throughput technique that can assay a large number of 
polynucleotides in a sample. This technology can be used as a diagnostic and as tool to test for 
differential expression expression, e.g., to determine function of an encoded protein. A variety of 
5 methods of producing arrays, as well as variations of these methods, are known in the art and 

contemplated for use in the invention. For example, arrays can be created by spotting polynucleotide 
probes onto a substrate (e.g., glass, nitrocellulose, etc.) in a two-dimensional matrix or array having 
bound probes. The probes can be bound to the substrate by either covalent bonds or by non-specific 
interactions, such as hydrophobic interactions. Samples of polynucleotides can be detectably labeled 

1 0 (e.g., using radioactive or fluorescent labels) and then hybridized to the probes. Double stranded 

polynucleotides, comprising the labeled sample polynucleotides bound to probe polynucleotides, can 
be detected once the unbound portion of the sample is washed away. Alternatively, the 
polynucleotides of the test sample can be immobilized on the array, and the probes detectably labeled. 
Techniques for constructing arrays and methods of using these arrays are described in, for example, 

1 5 Schena et al. (1996) Proc Natl Acad Sci USA. 93(20): 1 0614-9; Schena et al. (1995) Science 

270(523 5):467-70; Shalon et al. (1996) Genome Res. 6(7):639-45, USPN 5,807,522, EP 799 897; 
WO 97/29212; WO 97/27317; EP 785 280; WO 97/02357; USPN 5,593,839; USPN 5,578,832; EP 
728 520; USPN 5,599,695; EP 721 016; USPN 5,556,752; WO 95/22058; and USPN 5,63 1,734. 

Arrays can be used to, for example, examine differential expression of genes and can be used 

20 to determine gene function. For example, arrays can be used to detect differential expression of a 

gene corresponding to a polynucleotide of the invention, where expression is compared between a test 
cell and control cell (e.g., cancer cells and normal cells). For example, high expression of a particular 
message in a cancer cell, which is not observed in a corresponding normal cell, can indicate a cancer 
specific gene product. Exemplary uses of arrays are further described in, for example, Pappalarado et 

25 al., Sem. Radiation Oncol. (1998) 8:217; and Ramsay Nature Biotechnol. (1998) 16:40. Furthermore, 
many variations on methods of detection using arrays are well within the skill in the art and within the 
scope of the present invention. For example, rather than immobilizing the probe to a solid support, 
the test sample can be immobilized on a solid support which is then contacted with the probe. 
Differential Expression in Diagnosis 

3 0 The polynucleotides of the invention can also be used to detect differences in expression 

levels between two cells, e.g., as a method to identify abnormal or diseased tissue in a human. For 
polynucleotides corresponding to profiles of protein families, the choice of tissue can be selected 
according to the putative biological function. In general, the expression of a gene corresponding to a 
specific polynucleotide is compared between a first tissue that is suspected of being diseased and a 

35 second, normal tissue of the human. The tissue suspected of being abnormal or diseased can be 
derived from a different tissue type of the human, but preferably it is derived from the same tissue 
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type; for example, an intestinal polyp or other abnormal growth should be compared with normal 
intestinal tissue. The nonnal tissue can be the same tissue as that of the test sample, or any normal 
tissue of the patient, especially those that express the polynucleotide-related gene of interest (e.g., 
brain, thymus, testis, heart, prostate, placenta, spleen, small intestine, skeletal muscle, pancreas, and 
5 the mucosal lining of the colon). A difference between the polynucleotide-related gene, mRNA, or 
protein in the two tissues which are compared, for example, in molecular weight, amino acid or 
nucleotide sequence, or relative abundance, indicates a change in the gene, or a gene which regulates 
it, in the tissue of the human that was suspected of being diseased. Examples of detection of 
differential expression and its use in diagnosis of cancer are described in USPNs 5,688,641 and 
10 5,677,125. 

A genetic predisposition to disease in a human can also be detected by comparing expression 
levels of an mRNA or protein corresponding to a polynucleotide of the invention in a fetal tissue with 
levels associated in nonnal fetal tissue. Fetal tissues that are used for this purpose include, but are not 
limited to, amniotic fluid, chorionic villi, blood, and the blastomere of an in vitro-fertilized embryo. 

1 5 The comparable normal polynucleotide-related gene is obtained from any tissue. The mRNA or 
protein is obtained from a nonnal tissue of a human in which the polynucleotide-related gene is 
expressed. Differences such as alterations in the nucleotide sequence or size of the same product of 
the fetal polynucleotide-related gene or mRNA, or alterations in the molecular weight, amino acid 
sequence, or relative abundance of fetal protein, can indicate a germline mutation in the 

20 polynucleotide-related gene of the fetus, which indicates a genetic predisposition to disease. In 

general, diagnostic, prognostic, and other methods of the invention based on differential expression 
involve detection of a level or amount of a gene product, particularly a differentially expressed gene 
product, in a test sample obtained from a patient suspected of having or being susceptible to a disease 
(e.g., breast cancer, prostate cancer, lung cancer, colon cancer and/or metastatic forms thereof), and 

25 comparing the detected levels to those levels found in normal cells (e.g., cells substantially unaffected 
by cancer) and/or other control cells (e.g., to differentiate a cancerous cell from a cell affected by 
dysplasia). Furthermore, the severity of the disease can be assessed by comparing the detected levels 
of a differentially expressed gene product with those levels detected in samples representing the levels 
of differentially expressed gene product associated with varying degrees of severity of disease. It 

3 0 should be noted that use of the term "diagnostic" herein is not necessarily meant to exclude 
"prognostic" or "prognosis," but rather is used as a matter of convenience. 

The term "differentially expressed gene" is generally intended to encompass a polynucleotide 
that can, for example, include an open reading frame encoding a gene product (e.g., a polypeptide), 
and/or introns of such genes and adjacent 5' and 3' non-coding nucleotide sequences involved in the 

35 regulation of expression, up to about 20 kb beyond the coding region, but possibly further in either 
direction. The gene can be introduced into an appropriate vector for extrachromosomal maintenance 
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or for integration into a host genome. In general, a difference in expression level associated with a 
decrease in expression level of at least about 25%, usually at least about 50% to 75%, more usually at 
least about 90% or more is indicative of a differentially expressed gene of interest, i.e., a gene that is 
underexpressed or down-regulated in the test sample relative to a control sample. Furthermore, a 
5 difference in expression level associated with an increase in expression of at least about 25%, usually 
at least about 50% to 75%, more usually at least about 90% and can be at least about 1 '/2-fold, usually 
at least about 2-fold to about 10-fold, and can be about 100-fold to about 1,000-fold increase relative 
to a control sample is indicative of a differentially expressed gene of interest, i.e., an overexpressed or 
up-regulated gene. 

1 0 "Differentially expressed polynucleotide" as used herein means a nucleic acid molecule (RNA 

or DNA) comprising a sequence that represents a differentially expressed gene, e.g., the differentially 
expressed polynucleotide comprises a sequence (e.g., an open reading frame encoding a gene product) 
that uniquely identifies a differentially expressed gene so that detection of the differentially expressed 
polynucleotide in a sample is correlated with the presence of a differentially expressed gene in a 

15 sample. "Differentially expressed polynucleotide" is also meant to encompass fragments of the 
disclosed polynucleotides, e.g., fragments retaining biological activity, as well as nucleic acids 
homologous, substantially similar, or substantially identical (e.g., having about 90% sequence 
identity) to the disclosed polynucleotides. 

Methods of the subject invention useful in diagnosis or prognosis typically involve 

20 comparison of the abundance of a selected differentially expressed gene product in a sample of 
interest with that of a control to determine any relative differences in the expression of the gene 
product, where the difference can be measured qualitatively and/or quantitatively. Quantitation can be 
accomplished, for example, by comparing the level of expression product detected in the sample with 
die amounts of product present in a standard curve. A comparison can be made visually, by using a 

25 technique such as densitometry, witii or without computerized assistance; by preparing a 

representative library of cDNA clones of mRNA isolated from a test sample, sequencing the clones in 
the library to determine that number of cDNA clones corresponding to the same gene product, and 
analyzing the number of clones corresponding to that same gene product relative to die number of 
clones of die same gene product in a control sample; or by using an array to detect relative levels of 

3 0 hybridization to a selected sequence or set of sequences, and comparing the hybridization pattern to 
that of a control. The differences in expression are then correlated with the presence or absence of an 
abnormal expression pattern. A variety of different methods for determining the nucleic acid 
abundance in a sample are known to those of skill in the art (see, e.g., WO 97/273 17). 

In general, diagnostic assays of the invention involve detection of a gene product of a 

3 5 polynucleotide sequence (e.g., mRNA or polypeptide) that corresponds to a sequence of SEQ ID 

NOS:l-1485. The patient from whom the sample is obtained can be apparently healthy, susceptible to 
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disease (e.g., as determined by family history or exposure to certain environmental factors), or can 
already be identified as having a condition in which altered expression of a gene product of the 
invention is implicated. 

Diagnosis can be determined based on detected gene product expression levels of a gene 

5 product encoded by at least one, preferably at least two or more, at least 3 or more, or at least 4 or 
more of the polynucleotides having a sequence set forth in SEQ ED NOS:l-1485, and can involve 
detection of expression of genes corresponding to all of SEQ ID NOS .1-1485 and/or additional 
sequences that can serve as additional diagnostic markers and/or reference sequences. Where the 
diagnostic method is designed to detect the presence or susceptibility of a patient to cancer, the assay 

1 0 preferably involves detection of a gene product encoded by a gene corresponding to a polynucleotide 
that is differentially expressed in cancer. Examples of such differentially expressed polynucleotides 
are described in the Examples below. Given the provided polynucleotides and information regarding 
their relative expression levels provided herein, assays using such polynucleotides and detection of 
their expression levels in diagnosis and prognosis will be readily apparent to the ordinarily skilled 

15 artisan. 

Any of a variety of detectable labels can be used in connection with the various embodiments 
of the diagnostic methods of the invention. Suitable detectable labels include fluorochromes,(e.g. 
fluorescein isotliiocyanate (FITC), rhodamine, Texas Red, phycoerythrin, allophycocyanin, 6- 
carboxyfluorescein (6-FAM), 2',7'-dimetlioxy-4',5'-dichloro-6-carboxyfluorescein, 6-carboxy-X- 
20 rhodamine (ROX), 6-carboxy-2',4',7',4,7-hexachlorofluorescein (HEX), 5-carboxyfluorescein 

(5-FAM) or N^N'.N'-tetramethyl-e-carboxyrhodamine (TAMRA)), radioactive labels, (e.g. 32P, 
35S, 3H, etc.), and die like. The detectable label can involve a two stage systems (e.g., biotin-avidin, 
hapten-anti-hapten antibody, etc.). 

Reagents specific for the polynucleotides and polypeptides of the invention, such as 
25 antibodies and nucleotide probes, can be supplied in a kit for detecting the presence of an expression 
product in a biological sample. The kit can also contain buffers or labeling components, as well as 
instructions for using the reagents to detect and quantify expression products in the biological sample. 
Exemplary embodiments of the diagnostic methods of the invention are described below in more 
detail. 

3 o Polypeptide detection in diagnosis. In one embodiment, the test sample is assayed for the 

level of a differentially expressed polypeptide, such as a polypeptide of a gene corresponding to SEQ 
ID NOS:l-1485 and/or a polypeptide comprising a sequence of SEQ ID NO:1486-1542. Diagnosis 
can be accomplished using any of a number of methods to determine the absence or presence or 
altered amounts of the differentially expressed polypeptide in the test sample. For example, detection 

35 can utilize staining of cells or histological sections with labeled antibodies, performed in accordance 
witii conventional methods. Cells can be permeabilized to stain cytoplasmic molecules. In general, 
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antibodies that specifically bind a differentially expressed polypeptide of the invention are added to a 
sample, and incubated for a period of time sufficient to allow binding to the epitope, usually at least 
about 1 0 minutes. The antibody can be detectably labeled for direct detection (e.g., using 
radioisotopes, enzymes, fluorescers, chemiluminescers, and the like), or can be used in conjunction 

5 with a second stage antibody or reagent to detect binding (e.g., biotin with horseradish peroxidase- 
conjugated avidin, a secondary antibody conjugated to a fluorescent compound, e.g. fluorescein, 
rhodamine, Texas red, etc.). The absence or presence of antibody binding can be determined by 
various methods, including flow cytometry of dissociated cells, microscopy, radiography, scintillation 
counting, etc. Any suitable alternative methods of qualitative or quantitative detection of levels or 

1 0 amounts of differentially expressed polypeptide can be used, for example, ELISA, western blot, 
immunoprecipitation, radioimmunoassay, etc. 

mRNA detection. The diagnostic methods of the invention can also or alternatively involve 
detection of mRNA encoded by a gene corresponding to a differentially expressed polynucleotide of 
the invention. Any suitable qualitative or quantitative methods known in the art for detecting specific 

1 5 mRNAs can be used. mRNA can be detected by, for example, in situ hybridization in tissue sections, 
by reverse transcriptase-PCR, or in Northern blots containing poly A+ mRNA. One of skill in the art 
can readily use these methods to determine differences in the size or amount of mRNA transcripts 
between two samples. mRNA expression levels in a sample can also be determined by generation of a 
library of expressed sequence tags (ESTs) from the sample, where the EST library is representative of 

20 sequences present in the sample (Adams et al., (1991) Science 252:1651). Enumeration of the relative 
representation of ESTs within the library can be used to approximate the relative representation of the 
gene transcript within the starting sample. The results of EST analysis of a test sample can then be 
compared to EST analysis of a reference sample to determine the relative expression levels of a 
selected polynucleotide, particularly a polynucleotide corresponding to one or more of the 

25 differentially expressed genes described herein. Alternatively, gene expression in a test sample can be 
performed using serial analysis of gene expression (SAGE) methodology (e.g., Velculescu et al., 
Science (1995) 270:484) or differential display (DD) methodology (see, e.g., USPN 5,776,683 and 
USPN 5,807,680). 

Alternatively, gene expression can be analyzed using hybridization analysis. Oligonucleotides 
30 or cDNA can be used to selectively identify or capture DNA or RNA of specific sequence 

composition, and the amount of RNA or cDNA hybridized to a known capture sequence determined 
qualitatively or quantitatively, to provide information about the relative representation of a particular 
message within the pool of cellular messages in a sample. Hybridization analysis can be designed to 
allow for concurrent screening of the relative expression of hundreds to thousands of genes by using, 
3 5 for example, array-based technologies having high density formats, including filters, microscope 
slides, or microchips, or solution-based technologies that use spectroscopic analysis (e.g., mass 
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spectrometry). One exemplary use of arrays in the diagnostic methods of the invention is described 
below in more detail. 

Use of a single gene in diagnostic applications. The diagnostic methods of the invention can 
focus on the expression of a single differentially expressed gene. For example, the diagnostic method 

5 can involve detecting a differentially expressed gene, or a polymorphism of such a gene (e.g., a 
polymorphism in a coding region or control region), that is associated with disease. Disease- 
associated polymorphisms can include deletion or truncation of the gene, mutations that alter 
expression level and/or affect activity of the encoded protein, etc. 

A number of methods are available for analyzing nucleic acids for the presence of a specific 

10 sequence, e.g. a disease associated polymorphism. Where large amounts of DNA are available, 

genomic DNA is used directly. Alternatively, the region of interest is cloned into a suitable vector and 
grown in sufficient quantity for analysis. Cells that express a differentially expressed gene can be 
used as a source of mKNA, which can be assayed directly or reverse transcribed into cDNA for 
analysis. Hie nucleic acid can be amplified by conventional techniques, such as the polymerase chain 

1 5 reaction (PCR), to provide sufficient amounts for analysis, and a detectable label can be included in 
the amplification reaction (e.g., using a detectably labeled primer or detectably labeled 
oligonucleotides) to facilitate detection. Alternatively, various methods are also known in the art that 
utilize oligonucleotide ligation as a means of detecting polymorphisms, see, e.g., Riley et al., Nucl. 
Acids Res. (1990) 18:2887; and Delahunty et al., Am. J. Hum. Genet. (1996) 58:1239. 

20 The amplified or cloned sample nucleic acid can be analyzed by one of a number of methods 

known in the art. The nucleic acid can be sequenced by dideoxy or other methods, and the sequence 
of bases compared to a selected sequence, e.g., to a wild-type sequence. Hybridization with the 
polymorphic or variant sequence can also be used to determine its presence in a sample (e.g., by 
Southern blot, dot blot, etc.). The hybridization pattern of a polymorphic or variant sequence and a 

25 control sequence to an array of oligonucleotide probes immobilized on a solid support, as described in 
US 5,445,934, or in WO 95/35505, can also be used as a means of identifying polymorphic or variant 
sequences associated with disease. Single strand conformational polymorphism (SSCP) analysis, 
denaturing gradient gel electrophoresis (DGGE), and heteroduplex analysis in gel matrices are used to 
detect conformational changes created by DNA sequence variation as alterations in electrophoretic 

30 mobility. Alternatively, where a polymorphism creates or destroys a recognition site for a restriction 
endonuclease, the sample is digested with that endonuclease, and the products size fractionated to 
determine whether the fragment was digested. Fractionation is performed by gel or capillary 
electrophoresis, particularly acrylamide or agarose gels. 

Screening for mutations in a gene can be based on the functional or antigenic characteristics 

3 5 of the protein. Protein truncation assays are useful in detecting deletions that can affect the biological 
activity of the protein. Various immunoassays designed to detect polymorphisms in proteins can be 
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used in screening. Where many diverse genetic mutations lead to a particular disease phenotype, 
functional protein assays have proven to be effective screening tools. The activity of the encoded 
protein can be determined by comparison with the wild-type protein. 

Diagnosis. Prognosis, Assessment of Therapy (Therametricsl and Management of Cancer 
5 The polynucleotides of the invention, as well as their gene products, are of particular interest 

as genetic or biochemical markers (e.g., in blood or tissues) that will detect the earliest changes along 
the carcinogenesis pathway and/or to monitor the efficacy of various therapies and preventive 
interventions. For example, the level of expression of certain polynucleotides can be indicative of a 
poorer prognosis, and therefore warrant more aggressive chemo- or radio-therapy for a patient or vice 

10 versa. The correlation of novel surrogate tumor specific features with response to treatment and 

outcome in patients can define prognostic indicators that allow the design of tailored therapy based on 
the molecular profile of the tumor. These therapies include antibody targeting antagonists (e.g., small 
molecules), and gene therapy. Determining expression of certain polynucleotides and comparison of a 
patient's profile with known expression in normal tissue and variants of the disease allows a 

15 determination of the best possible treatment for a patient, both in terms of specificity of treatment and 
in terms of comfort level of the patient. Surrogate tumor markers, such as polynucleotide expression, 
can also be used to better classify, and thus diagnose and treat, different forms and disease states of 
cancer. Two classifications widely used in oncology that can benefit from identification of the 
expression levels of the genes corresponding to the polynucleotides of the invention are staging of the 

20 cancerous disorder, and grading the nature of the cancerous tissue. 

The polynucleotides that correspond to differentially expressed genes, as well as their encoded 
gene products, can be useful to monitor patients having or susceptible to cancer to detect potentially 
malignant events at a molecular level before they are detectable at a gross morphological level. In 
addition, the polynucleotides of the invention, as well as the genes corresponding to such 

25 polynucleotides, can be useful as therametrics, e.g., to assess the effectiveness of therapy by using the 
polynucleotides or their encoded gene products, to assess, for example, tumor burden in the patient 
before, during, and after therapy. 

Furthermore, a polynucleotide identified as corresponding to a gene that is differentially 
expressed in, and thus is important for, one type of cancer can also have implications for development 

30 or risk of development of other types of cancer, e.g., where a polynucleotide represents a gene 
differentially expressed across various cancer types. Thus, for example, expression of a 
polynucleotide corresponding to a gene that has clinical implications for metastatic colon cancer can 
also have clinical implications for stomach cancer or endometrial cancer. 

Staging. Staging is a process used by physicians to describe how advanced the cancerous 

35 state is in a patient. Staging assists the physician in determining a prognosis, planning treatment and 
evaluating the results of such treatment. Staging systems vary with the types of cancer, but generally 
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involve the following "TNM" system: the type of tumor, indicated by T; whether the cancer has 
metastasized to nearby lymph nodes, indicated by N; and whether the cancer has metastasized to more 
distant parts of the body, indicated by M. Generally, if a cancer is only detectable in the area of the 
primary lesion without having spread to any lymph nodes, it is called Stage I or Stage II, depending on 
5 the degree of invasiveness as indicated by the tumor grade of the primary lesion. If the primary lesion 
is of tumor grade I or II and die patient does not have any regional or distant metastasis, the cancer is 
classified as Stage I. If the primary lesion is of tumor grade III or IV and the patient does not have any 
regional or distant metastasis, the cancer is classified as Stage H If the cancer has spread only to the 
regional lymph nodes, it is classified as Stage DI. Cancers that have spread to a distant part of the 

1 0 body, such as liver, bone, brain or other sites, are Stage IV, die most advanced stage. 

The polynucleotides of die invention can facilitate fine-tuning of the staging process by 
identifying markers for die aggresivity of a cancer, e.g., the metastatic potential, as well as the 
presence in different areas of the body. Thus, a Stage II cancer witii a polynucleotide signifying a 
high metastatic potential cancer can be used to change a borderline Stage II tumor to a Stage IH tumor, 

1 5 justifying more aggressive therapy. Conversely, the presence of a polynucleotide signifying a lower 
metastatic potential allows more conservative staging of a tumor. 

Grading of cancers. Grade is a term used to describe how closely a tumor resembles normal 
tissue of its same type. The microscopic appearance of a tumor is used to identify tumor grade based 
on parameters such as cell morphology, cellular organization, and other markers of differentiation. As 

20 a general rule, die grade of a tumor corresponds to its rate of growth or aggressiveness, with 

undifferentiated or high-grade tumors being more aggressive than well-differentiated or low-grade 
tumors. The following guidelines are generally used for grading tumors: 1) GX Grade cannot be 
assessed; 2) Gl Well differentiated; 3) G2 Moderately well differentiated; 4) G3 Poorly differentiated; 
5) G4 Undifferentiated. The polynucleotides of the invention can be especially valuable in 

25 determining the grade of the tumor, as they not only can aid in determining the differentiation status of 
the cells of a tumor, they can also identify factors other than differentiation that are valuable in 
determining the aggressiveness of a tumor, such as metastatic potential. 

For prostate cancer, the Gleason Grading/Scoring system is most commonly used. A prostate 
biopsy tissue sample is examined under a microscope and a grade is assigned to the tissue based on: 1) 

30 the appearance of the cells, and 2) the arrangement of the cells. Each parameter is assessed on a scale 
of one (cells are almost normal) to five (abnormal), and the individual Gleason Grades are presented 
separated by a "+" sign. Alternatively, the two grades are combined to give a Gleason Score of 2-10. 
Thus, for a tissue sample that received a grade of 3 for each parameter, the Gleason Grade would be 
3+3 and the Gleason Score would be 6. A lower Gleason Score indicates a well-differentiated tumor, 

3 5 while a higher Gleason Score indicates a poorly differentiated cancer that is more likely to spread. 
The majority of biopsies in general are Gleason Scores 5, 6 and 7. 
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Gleason Score 

2,3,4 


Gleason Score 
5,6,7 


Gleason Score 
8, 9, 10 


Low-grade tumor 


Medium-grade tumor 


High-grade tumor 


Slow Growth 


Unpredictable Growth 


Aggressive Growth 


Least dangerous. 

Cells look most like normal 
prostate cells and are described 
as being "well-differentiated". 

Tends to be slow growing. 


Intermediate cancers may 
behave like low-grade or high- 
grade cancers. 

The cells' behavior may 
depend on the volume of the 
cancer and the PSA level. 

Tins is the most common 
grade of prostate cancer. 


High-grade cancers are usually 
very aggressive and quick to 
spread to the tissue 
surrounding the prostate. 

These cancer cells look least 
like normal prostate cells and 
are usually described as 
"poorly-differentiated" . 



The polynucleotides of the Sequence Listing, and their corresponding genes and gene 
products, can be especially valuable in determining the grade of the tumor, as they not only can aid in 
detennining the differentiation status of the cells of a tumor, they can also identify factors other than 
5 differentiation that are valuable in determining the aggressiveness of a tumor, such as metastatic 
potential. 

Assessment of proliferation of cells in tumor. The differential expression level of the 
polynucleotides described herein can facilitate assessment of the rate of proliferation of tumor cells, 
and thus provide an indicator of the aggressiveness of the rate of tumor growth. For example, 

1 0 assessment of the relative expression levels of genes involved in the cell cycle can provide an 
indication of cellular proliferation, and thus serve as a marker of proliferation. 

Detection of colon cancer. The polynucleotides corresponding to genes that exhibit the 
appropriate expression pattern can, be used to detect colon cancer in a subject. Colorectal cancer is 
one of the most common neoplasms in humans and perhaps the most frequent form of hereditary 

1 5 neoplasia. Prevention and early detection are key factors in controlling and curing colorectal cancer. 
Colorectal cancer begins as polyps, which are small, benign growths of cells that form on the inner 
lining of die colon. Over a period of several years, some of these polyps accumulate additional 
mutations and become cancerous. Multiple familial colorectal cancer disorders have been identified, 
which are summarized as follows: 1) Familial adenomatous polyposis (FAP); 2) Gardner's syndrome; 

20 3) Hereditary nonpolyposis colon cancer (HNPCC); and 4) Familial colorectal cancer in Ashkenazi 
Jews. The expression of appropriate polynucleotides of the invention can be used in the diagnosis, 
prognosis and management of colorectal cancer. Detection of colon cancer can be determined using 
expression levels of any of these sequences alone or in combination with the levels of expression. 
Determination of the aggressive nature and/or the metastatic potential of a colon cancer can be 

25 detennined by comparing levels of one or more polynucleotides of the invention and comparing total 
levels of another sequence known to vary in cancerous tissue, e.g., expression of p53, DCC ras, lor 
FAP (see, e.g., Fearon ER, et al. Cell (1990) 61(5):759; Hamilton SR et al., Cancer (1993) 72:957; 
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Bodmer W, et al., Nat Genet. (1994) 4(3):217; Fearon ER, AnnN Y Acad Sci. (1995) 768:101). For 
example, development of colon cancer can be detected by examining the ratio of any of die 
polynucleotides of the invention to the levels of oncogenes (e.g., ras) or tumor suppressor genes (e.g., 
FAP or p53). Thus, expression of specific marker polynucleotides can be used to discriminate 
5 between normal and cancerous colon tissue, to discriminate between colon cancers with different cells 
of origin, to discriminate between colon cancers with different potential metastatic rates, etc. For a 
review of markers of cancer, see, e.g., Hanahan et al. (2000) Cell 100:57-70. 

Detection of prostate cancer. The polynucleotides and their corresponding genes and gene 
products exhibiting the appropriate differential expression pattern can be used to detect prostate 

1 0 cancer in a subject. Prostate cancer is quite common in humans, with one out of every six men at a 
lifetime risk for prostate cancer, and can be relatively harmless or extremely aggressive. Some 
prostate tumors are slow growing, causing few clinical symptoms, while aggressive rumors spread 
rapidly to the lymph nodes, odier organs and especially bone. Over 95% of primary prostate cancers 
are adenocarcinomas. Signs and symptoms may include: frequent urination, especially at night; 

1 5 inability to urinate; trouble starting or holding back urination; a weak or interrupted urine flow; and 
frequent pain or stiffness in the lower back, hips or upper thighs. 

The prostate is divided into three areas - the peripheral zone, the transition zone, and the 
central zone - with a layer of tissue surrounding all three. Most prostate tumors form in the peripheral 
zone; the larger, glandular portion of the organ. Prostate cancer can also form in the tissue of the 

20 central zone. Surrounding the prostate is the prostate capsule, a tissue that separates the prostate from 
the rest of die body. When prostate cancer remains inside the prostate capsule, it is considered 
localized and treatable with surgery. Once die cancer punctures the capsule and spreads outside, 
treatment options are more limited. Prevention and early detection are key factors in controlling and 
curing prostate cancer. 

25 While the Gleason Grade or Score of a prostate cancer can provide information useful in 

determining the appropriate treatment of a prostate cancer, the majority of prostate cancers are 
Gleason Scores 5, 6, and 7, which exhibit unpredictable behavior. These cancers may behave like less 
dangerous low-grade cancers or like extremely dangerous high-grade cancers. As a result, a patient 
living with a medium-grade prostate cancer is at constant risk of developing high-grade cancer. 

3 0 The expression of appropriate polynucleotides can be used in the diagnosis, prognosis and 

management of prostate cancer. Detection of prostate cancer can be determined using expression 
levels of any of tiiese sequences alone or in combination with the levels of expression of any other 
nucleotide sequences. Determination of the aggressive nature and/or the metastatic potential of a 
prostate cancer can be determined by comparing levels of one or more gene products of the genes 

35 corresponding to the polynucleotides described herein, and comparing total levels of another sequence 
known to vary in cancerous tissue, e.g., expression of p53, DCC, ras, FAP (see, e.g., Fearon ER, et 
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al, Cell (1990) 61(5):1S9; Hamilton SRetal., Cancer (1993) 72:957; Bodmer W, etal, Nat Genet. 
(1994) 4(3):2\1; FearonER,AnnNYAcadSci. (1995) 768:101). 

For example, development of prostate cancer can be detected by examining the level of 
expression of a gene corresponding to a polynucleotides described herein to the levels of oncogenes 
5 {e.g. ras) or tumor suppressor genes {e.g. FAP or p53). Thus expression of specific marker 
polynucleotides can be used to discriminate between normal and cancerous prostate tissue, to 
discriminate between prostate cancers with different cells of origin, to discriminate between prostate 
cancers with different potential metastatic rates, etc. For a review of markers of cancer, see, e.g., 
Hanahan et al. (2000) Cell 1 00:57-70. 

10 In addition, many of the signs and symptoms of prostate cancer can be caused by a variety of 

other non-cancerous conditions. For example, one common cause of many of these signs and 
symptoms is a condition called benign prostatic hypertrophy, or BPH. In BPH, the prostate gets bigger 
and may block the flow of urine or interfere with sexual function. The methods and compositions of 
the invention can be used to distinguish between prostate cancer and such non-cancerous conditions. 

15 The methods of die invention can be used in conjunction with conventional methods of diagnosis, 
e.g., digital rectal exam and/or detection of the level of prostate specific antigen (PSA), a substance 
produced and secreted by the prostate, and/or prostatic acid phosphatase (PAP). 

Detection of breast cancer. The majority of breast cancers are adenocarcinoma subtypes, 
which can be summarized as follows: 1) ductal carcinoma in situ (DCIS), including 

20 comedocarcinoma; 2) infiltrating (or invasive) ductal carcinoma (TDC); 3) lobular carcinoma in situ 
(LCIS); 4) infiltrating (or invasive) lobular carcinoma (ELC); 5) inflammatory breast cancer; 6) 
medullary carcinoma; 7) mucinous carcinoma; 8) Paget's disease of the nipple; 9) Phyllodes tumor; 
and 1 0) tubular carcinoma. 

The expression of polynucleotides of the invention can be used in the diagnosis and 

25 management of breast cancer, as well as to distinguish between types of breast cancer. Detection of 
breast cancer can be determined using expression levels of any of the appropriate polynucleotides of 
the invention, either alone or in combination. Determination of the aggressive nature and/or the 
metastatic potential of a breast cancer can also be determined by comparing levels of one or more 
polynucleotides of the invention and comparing levels of another sequence known to vary in 

30 cancerous tissue, e.g., ER expression.. In addition, development of breast cancer can be detected by 
examining the ratio of expression of a differentially expressed polynucleotide to the levels of steroid 
hormones (e.g., testosterone or estrogen) or to other hormones (e.g., growth hormone, insulin). Thus, 
expression of specific marker polynucleotides can be used to discriminate between normal and 
cancerous breast tissue, to discriminate between breast cancers with different cells of origin, to 

35 discriminate between breast cancers with different potential metastatic rates, etc. 
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Detection of lung cancer. The polynucleotides of the invention can be used to detect lung 
cancer in a subject. Although there are more than a dozen different kinds of lung cancer, the two main 
types of lung cancer are small cell and nonsmall cell, which encompass about 90% of all lung cancer 
cases. Small cell carcinoma (also called oat cell carcinoma) usually starts in one of the larger 
5 bronchial tubes, grows fairly rapidly, and is likely to be large by the time of diagnosis. Nonsmall cell 
lung cancer (NSCLC) is made up of three general subtypes of lung cancer. Epidermoid carcinoma 
(also called squamous cell carcinoma) usually starts in one of the larger bronchial tubes and grows 
relatively slowly. The size of these tumors can range from very small to quite large. Adenocarcinoma 
starts growing near the outside surface of the lung and can vary in both size and growth rate. Some 
10 slowly growing adenocarcinomas are described as alveolar cell cancer. Large cell carcinoma starts 
near the surface of the lung, grows rapidly, and the growth is usually fairly large when diagnosed. 
Other less common forms of lung cancer are carcinoid, cylindroma, mucoepidermoid, and malignant 
mesothelioma. 

The polynucleotides of the invention, e.g., polynucleotides differentially expressed in 

15 normal cells versus cancerous lung cells (e.g., tumor cells of high or low metastatic potential) or 
between types of cancerous lung cells (e.g., high metastatic versus low metastatic), can be used to 
distinguish types of lung cancer as well as identifying traits specific to a certain patient's cancer and 
selecting an appropriate therapy. For example, if the patient's biopsy expresses a polynucleotide that 
is associated with a low metastatic potential, it may justify leaving a larger portion of the patient's 

20 lung in surgery to remove the lesion. Alternatively a smaller lesion with expression of a 

polynucleotide that is associated with high metastatic potential may justify a more radical removal of 
lung tissue and/or the surrounding lymph nodes, even if no metastasis can be identified through 
pathological examination. 

Tumor classification and patient stratification 

25 Hie invention further provides for methods of classifying tumors, and thus grouping or 

"stratifying" patients, according to the expression profile of selected differentially expressed genes in a 
tumor. Differentially expressed genes can be analyzed for correlation with other differentially 
expressed genes in a single tumor type (e.g., a prostate tumor) or between tumor types (e.g., between 
prostate and colon tumors). Genes that demonstrate consistent correlation in expression profile in a 

30 given cancer cell type (e.g., in a prostate cancer cell or type of prostate cancer) can be grouped 
together, e.g., when one gene is overexpressed in a tumor, a second gene is also usually 
overexpressed. Tumors can then be classified according to the expression profile of one or more 
genes selected from one or more groups. 

Hie tumor of each patient in a pool of potential patients can be classified as described above. 

35 Patients having similarly classified tumors can then be selected for participation in an investigative or 
clinical trial of a cancer therapeutic where a homogeneous population is desired. The tumor 
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classification of a patient can also be used in assessing the efficacy of a cancer therapeutic in a 
heterogeneous patient population. In addition, therapy for a patient having a tumor of a given 
expression profile can then be selected accordingly. 
Treatment of cancer 

5 The invention further provides methods for reducing growth of cancer cells. In general, the 

methods comprise contacting a cancer cell with a substance that modulates (1) expression of a 
polynucleotide corresponding to a gene that is differentially expressed in cancer; or (2) a level of 
and/or an activity of a cancer-associated polypeptide. In general, the methods provide for decreasing 
the expression of a gene that is differentially expressed in a cancer cell (e.g., overexpressed) or 

1 0 decreasing the level of and/or decreasing an activity of a cancer-associated polypeptide. The methods 
also provide for increasing expression of a gene that is underexpressed in a cancer cell or increasing 
Hie level of and/or increasing an activity of a cancer-associated polypeptide. 

"Reducing growth of cancer cells" includes, but is not limited to, reducing proliferation of 
cancer cells (e.g., prostate, colon, lung, breast, etc. cancer cells), and reducing the incidence of a non- 

1 5 cancerous cell becoming a cancerous cell. Whether a reduction in cancer cell growth has been 
achieved can be readily determined using any known assay, including, but not limited to, [ 3 H]- 
thymidine incorporation; counting cell number over a period of time; detecting and/or measuring a 
marker associated with the cancer type (e.g., CEA, CA19-9, LASA, PSA, PAP, CA15-3, CA27-29, 
NSE, LDH, etc.). 

20 The present invention provides methods for treating cancer, generally comprising 

administering to an individual in need thereof a substance that reduces cancer cell growth, in an 
amount sufficient to reduce cancer cell growth and treat the cancer. Whether a substance, or a specific 
amount of the substance, is effective in treating cancer can be assessed using any of a variety of 
known diagnostic assays for the particular type of cancer being treated. The substance can be 

25 administered systemically or locally. Thus, in some embodiments, the substance is administered 
locally, and cancer growtii is decreased at the site of administration. Local administration may be 
useful in treating, e.g., a solid tumor. 

A substance that reduces cancer cell growth can be targeted to a cancer cell. Thus, in some 
embodiments, the invention provides a method of delivering a drug to a cancer cell, comprising 

30 administering a drug-antibody complex to a subject, wherein the antibody is specific for a particular 
cancer-associated polypeptide, and the drug is one that reduces cancer cell growdi, a variety of which 
are known in the art. Targeting can be accomplished by coupling (e.g., linking, directly or via a linker 
molecule, either covalently or non-covalently, so as to form a drug-antibody complex) a drug to an 
antibody specific for a particular cancer-associated polypeptide. Methods of coupling a drug to an 

35 antibody are well known in die art and need not be elaborated upon herein. 
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In another embodiment, differentially expressed gene products {e.g., polypeptides or 
polynucleotides encoding such polypeptides) may be effectively used in treatment through 
vaccination. The growth of cancer cells is naturally limited in part due to immune surveillance. 
Stimulation of the immune system using a particular tumor-specific antigen enhances the effect 
5 towards the tumor expressing die antigen. An active vaccine comprising a polypeptide encoded by die 
cDNA of this invention would be appropriately administered to subjects having overabundance of the 
corresponding RNA, or those predisposed for developing cancer cells with overabundance of die same 
RNA. Polypeptide antigens are typically combined with an adjuvant as part of a vaccine composition. 
The vaccine is preferably administered first as a priming dose, and then again as a boosting dose, 

1 0 usually at least four weeks later. Further boosting doses may be given to enhance the effect. The dose 
and its timing are usually determined by the person responsible for the treatment. 

The invention also encompasses die selection of a therapeutic regimen based upon the 
expression profile of differentially expressed genes in the patient's tumor. For example, a tumor can 
be analyzed for its expression profile of the genes corresponding to SEQ ID NOS:l-1542 as described 

1 5 herein, e.g., the tumor is analyzed to determine which genes are expressed at elevated levels or at 

decreased levels relative to normal cells of (he same tissue type. The expression patterns of the tumor 
are then compared to the expression patterns of tumors that respond to a selected therapy. Where die 
expression profiles of die test tumor cell and the expression profile of a tumor cell of known drug 
responsivity at least substantially match (e.g., selected sets of genes at elevated levels in the tumor of 

20 known drug responsivitiy and are also at elevated levels in die test tumor cell), then the drug selected 
for dierapy is the drug to which tumors with diat expression pattern respond. 

Identification of Therapeutic Targets and Anti-Cancer Therapeutic Agents 
The present invention also encompasses methods for identification of agents having die ability 
to modulate activity of a differentially expressed gene product, as well as methods for identifying a 

25 differentially expressed gene product as a therapeutic target for treatment of cancer, especially prostate 
cancer. 

Candidate agents 

Identification of compounds that modulate activity of a differentially expressed gene product 
can be accomplished using any of a variety of drug screening techniques. Such agents are candidates 

30 for development of cancer therapies. Of particular interest are screening assays for agents tiiat have 
tolerable toxicity for normal, non-cancerous human cells. The screening assays of die invention are 
generally based upon the ability of the agent to modulate an activity of a differentially expressed gene 
product and/or to inhibit or suppress phenomenon associated with cancer (e.g., cell proliferation, 
colony formation, cell cycle arrest, metastasis, and the like). 

35 The term "agent" as used herein describes any molecule, e.g. protein or pharmaceutical, witii 

the capability of modulating a biological activity of a gene product of a differentially expressed gene. 
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Generally a plurality of assay mixtures are run in parallel with different agent concentrations to obtain 
a differential response to the various concentrations. Typically, one of these concentrations serves as a 
negative control, i.e. at zero concentration or below the level of detection. 

Candidate agents encompass numerous chemical classes, though typically they are organic 
5 molecules, preferably small organic compounds having a molecular weight of more than 50 and less 
than about 2,500 daltons. Candidate agents comprise functional groups necessary for structural 
interaction with proteins, particularly hydrogen bonding, and typically include at least an amine, 
carbonyl, hydroxyl or carboxyl group, preferably at least two of the functional chemical groups. The 
candidate agents often comprise cyclical carbon or heterocyclic structures and/or aromatic or 

10 polyaromatic structures substituted with one or more of the above functional groups. Candidate 

agents are also found among biomolecules including, but not limited to: peptides, saccharides, fatty 
acids, steroids, purines, pyrimidines, derivatives, structural analogs or combinations thereof. 

Candidate agents are obtained from a wide variety of sources including libraries of synthetic 
or natural compounds. For example, numerous means are available for random and directed synthesis 

15 of a wide variety of organic compounds and biomolecules, including expression of randomized 
oligonucleotides and oligopeptides. Alternatively, libraries of natural compounds in the form of 
bacterial, fungal, plant and animal extracts (including extracts from human tissue to identify 
endogenous factors affecting differentially expressed gene products) are available or readily produced. 
Additionally, natural or synthetically produced libraries and compounds are readily modified through 

20 conventional chemical, physical and biochemical means, and may be used to produce combinatorial 
libraries. Known pharmacological agents may be subjected to directed or random chemical 
modifications, such as acylation, alkylation, esterification, amidification, etc. to produce structural 
analogs. 

Exemplary candidate agents of particular interest include, but are not limited to, antisense 
25 polynucleotides, and antibodies, soluble receptors, and the like. Antibodies and soluble receptors are 
of particular interest as candidate agents where the target differentially expressed gene product is 
secreted or accessible at the cell-surface (e.g., receptors and other molecule stably-associated with the 
outer cell membrane). 

Screening of candidate agents 
30 S creening assays can be based upon any of a variety of techniques readily available and 

known to one of ordinary skill in the art. In general, the screening assays involve contacting a 
cancerous cell (preferably a cancerous prostate cell) with a candidate agent, and assessing the effect 
upon biological activity of a differentially expressed gene product. The effect upon a biological 
activity can be detected by, for example, detection of expression of a gene product of a differentially 
35 expressed gene {e.g., a decrease in mRNA or polypeptide levels, would in turn cause a decrease in 
biological activity of the gene product). Alternatively or in addition, the effect of the candidate agent 
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can be assessed by examining the effect of the candidate agent in a functional assay. For example, 
where the differentially expressed gene product is an enzyme, then the effect upon biological activity 
can be assessed by detecting a level of enzymatic activity associated with the differentially expressed 
gene product. The functional assay will be selected according to the differentially expressed gene 
5 product. In general, where the differentially expressed gene is increased in expression in a cancerous 
cell, agents of interest are those that decrease activity of the differentially expressed gene product. 

Assays described infra can be readily adapted in the screening assay embodiments of the 
invention. Exemplary assays useful in screening candidate agents include, but are not limited to, 
hybridization-based assays (e.g., use of nucleic acid probes or primers to assess expression levels), 

10 antibody-based assays (e.g., to assess levels of polypeptide gene products), binding assays (e.g., to 
detect interaction of a candidate agent with a differentially expressed polypeptide, which assays may 
be competitive assays where a natural or synthetic ligand for the polypeptide is available), and the like. 
Additional exemplary assays include, but are not necessarily limited to, cell proliferation assays, 
antisense knockout assays, assays to detect inhibition of cell cycle, assays of induction of cell 

15 death/apoptosis, and the like. Generally such assays are conducted in vitro, but many assays can be 
adapted for in vivo analyses, e.g., in an animal model of the cancer. 
Identificati on of therapeutic targets 

In another embodiment, the invention contemplates identification of differentially expressed 
genes and gene products as therapeutic targets. In some respects, this is the converse of the assays 

20 described above for identification of agents having activity in modulating (e.g., decreasing or 
increasing) activity of a differentially expressed gene product. 

In this embodiment, therapeutic targets are identified by examining the effect(s) of an agent 
that can be demonstrated or has been demonstrated to modulate a cancerous pheiiotype (e.g., inhibit or 
suppress or prevent development of a cancerous phenotype). Such agents are generally referred to 

25 herein as an "anti-cancer agent", which agents encompass chemotherapeutic agents. For example, the 
agent can be an antisense oligonucleotide that is specific for a selected gene transcript. For example, 
the antisense oligonucleotide may have a sequence corresponding to a sequence of a differentially 
expressed gene described herein, e.g., a sequence of one of SEQ ID NOS: 1-2164. 

Assays for identification of therapeutic targets can be conducted in a variety of ways using 

3 0 methods that are well known to one of ordinary skill in the art. For example, a test cancerous cell that 
expresses or overexpresses a differentially expressed gene is contacted with an anti-cancer agent, the 
effect upon a cancerous phenotype and a biological activity of the candidate gene product assessed. 
The biological activity of the candidate gene product can be assayed be examining, for example, 
modulation of expression of a gene encoding the candidate gene product (e.g., as detected by, for 

3 5 example, an increase or decrease in transcript levels or polypeptide levels), or modulation of an 
enzymatic or other activity of the gene product. The cancerous phenotype can be, for example, 
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cellular proliferation, loss of contact inhibition of growth (e.g., colony formation), tumor growth (in 
vitro or in vivo), and the like. Alternatively or in addition, the effect of modulation of a biological 
activity of the candidate target gene upon cell death/apoptosis or cell cycle regulation can be assessed. 

5 Inhibition or suppression of a cancerous phenotype, or an increase in cell/death apoptosis as a 

result of modulation of biological activity of a candidate gene product indicates that the candidate 
gene product is a suitable target for cancer therapy. Assays described infra can be readily adapted in 
for assays for identification of dierapeutic targets. Generally such assays are conducted in vitro, but 
many assays can be adapted for in vivo analyses, e.g., in an appropriate, art-accepted animal model of 
10 the cancer. 

Use of Polynucleotides to Screen for Peptide Analogs and Antagonists 
Polypeptides encoded by the instant polynucleotides and corresponding full-length genes can 
be used to screen peptide libraries to identify binding partners, such as receptors, from among the 
encoded polypeptides. Peptide libraries can be synthesized according to methods known in the art 

15 (see, e.g., USPN 5,010,175 , and WO 91/17823). 

Agonists or antagonists of the polypeptides of the invention can be screened using any 
available method known in the art, such as signal transduction, antibody binding, receptor binding, 
mitogenic assays, chemotaxis assays, etc. The assay conditions ideally should resemble the conditions 
under which the native activity is exhibited in vivo, that is, under physiologic pH, temperature, and 

20 ionic strength. Suitable agonists or antagonists will exhibit strong inhibition or enhancement of the 
native activity at concentrations that do not cause toxic side effects in the subject. Agonists or 
antagonists that compete for binding to the native polypeptide can require concentrations equal to or 
greater than the native concentration, while inhibitors capable of binding irreversibly to the 
polypeptide can be added in concentrations on the order of the native concentration. 

25 Such screening and experimentation can lead to identification of a novel polypeptide binding 

partner, such as a receptor, encoded by a gene or a cDNA corresponding to a polynucleotide of the 
invention, and at least one peptide agonist or antagonist of the novel binding partner. Such agonists 
and antagonists can be used to modulate, enhance, or inhibit receptor function in cells to which the 
receptor is native, or in cells that possess the receptor as a result of genetic engineering. Further, if the 

3 0 novel receptor shares biologically important characteristics with a known receptor, information about 
agonist/antagonist binding can facilitate development of improved agonists/antagonists of the known 
receptor. 

Vaccines and Uses 

The differentially expressed nucleic acids and polypeptides produced by the nucleic acids of 
3 5 the invention can also be used to modulate primary immune response to prevent or treat cancer. Every 
immune response is a complex and intricately regulated sequence of events involving several cell 
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types. It is triggered when an antigen enters the body and encounters a specialized class of cells called 
antigen-presenting cells (APCs). These APCs capture a minute amount of the antigen and display it in 
a form that can be recognized by antigen-specific helper T lymphocytes. The helper (Th) cells 
become activated and, in turn, promote the activation of other classes of lymphocytes, such as B cells 
5 or cytotoxic T cells. The activated lymphocytes then proliferate and carry out their specific effector 
functions, which in many cases successfully activate or eliminate the antigen. Thus, activating the 
immune response to a particular antigen associated with a cancer cell can protect the patient from 
developing cancer or result in lymphocytes eliminating cancer cells expressing the antigen. 

Gene products, including polypeptides, mRNA (particularly mRNAs having distinct 

10 secondary and/or tertiary structures), cDNA, or complete gene, can be prepared and used in vaccines 
for the treatment or prevention of hyperproliferative disorders and cancers. The nucleic acids and 
polypeptides can be utilized to enhance the immune response, prevent tumor progression, prevent 
hyperproliferative cell growth, and the like. Methods for selecting nucleic acids and polypeptides that 
are capable' of enhancing the immune response are known in the art. Preferably, the gene products for 

1 5 use in a vaccine are gene products which are present on the surface of a cell and are recognizable by 
lymphocytes and antibodies. 

The gene products maybe formulated with pharmaceutically acceptable carriers into 
pharmaceutical compositions by mediods known in the art. The composition is useful as a vaccine to 
prevent or treat cancer. Hie composition may further comprise at least one co-immunostimulatory 

20 molecule, including but not limited to one or more major histocompatibility complex (MHC) 

molecules, such as a class I or class II molecule, preferably a class I molecule. The composition may 
further comprise other stimulator molecules including B7.1, B7.2, ICAM-1, ICAM-2, LFA-1, LFA-3, 
CD72 and die like, immunostimulatory polynucleotides (which comprise an 5-CG-3' wherein die 
cytosine is unmetirylated), and cytokines which include but are not limited to IL-1 through IL-15, 

25 TNF-a, IFN-y, RANTES, G-CSF, M-CSF, IFN-a, CTAP HI, ENA-78, GRO, 1-309, PF-4, IP-10, LD- 
78, MGSA, MBP-la, MIP-ip, or combination thereof, and the like for immunopotentiation. In one 
embodiment, the immunopotentiators of particular interest are those which facilitate a Thl immune 
response. 

The gene products may also be prepared with a carrier that will protect the gene products 
3 0 against rapid elimination from the body, such as a controlled release formulation, including implants 
and microencapsulated delivery systems. Biodegradable polymers can be used, such as ediylene vinyl 
acetate, polyanhydrides, polyglycolic acid, collagen, polyorthoesters, polylactic acid, and the like. 
Metiiods for preparation of such formulations are known in the art. 

In die mediods of preventing or treating cancer, the gene products may be administered via 
3 5 one of several routes including but not limited to transdermal, transmucosaL intravenous, 

intramuscular, subcutaneous, intradermal, intraperitoneal, intrathecal, intrapleural, intrauterine, rectal, 
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vaginal, topical, intratumor, and the like. For transmucosal or transdermal administration, penetrants 
appropriate to the barrier to be permeated are used in the formulation. Such penetrants are generally 
known in the art, and include, for example, administration bile salts and fusidic acid derivatives. In 
addition, detergents may be used to facilitate permeation. Transmucosal administration may be by 
5 nasal sprays or suppositories. For oral administration, the gene products are formulated into 
conventional oral administration form such as capsules, tablets and toxics. 

The gene product is administered to a patient in an amount effective to prevent or treat cancer. 
In general, it is desirable to provide the patient with a dosage of gene product of at least about 1 pg 
per Kg body weight, preferably at least about 1 ng per Kg body weight, more preferably at least about 

10 1 |ag or greater per Kg body weight of the recipient. A range of from about 1 ng per Kg body weight 
to about 100 mg per Kg body weight is preferred although a lower or higher dose may be 
administered. The dose is effective to prime, stimulate and/or cause the clonal expansion of antigen- 
specific T lymphocytes, preferably cytotoxic T lymphocytes, which in turn are capable of preventing 
or treating cancer in the recipient. The dose is administered at least once and may be provided as a 

1 5 bolus or a continuous administration. Multiple administrations of the dose over a period of several 
weeks to montiis may be preferable. Subsequent doses may be administered as indicated. 

In another method of treatment, autologous cytotoxic lymphocytes or tumor infiltrating 
lymphocytes may be obtained from a patient with cancer. The lymphocytes are grown in culture, and 
antigen-specific lymphocytes are expanded by culturing in the presence of the specific gene products 

20 alone or in combination with at least one co-immunostimulatory molecule with cytokines. The 

antigen-specific lymphocytes are then infused back into the patient in an amount effective to reduce or 
eliminate the tumors in the patient. Cancer vaccines and their uses are further described in USPN 
5,961,978; USPN 5,993,829; USPN 6,132,980; and WO 00/38706. 
Pharmaceutical Compositions and Uses 

25 Pharmaceutical compositions can comprise polypeptides, receptors that specifically bind a 

polypeptide produced by a differentially expressed gene (e.g., antibodies, or polynucleotides 
(including antisense nucleotides and ribozymes) of the claimed invention in a therapeutically effective 
amount. The compositions can be used to treat primary tumors as well as metastases of primary 
tumors, hi addition, the pharmaceutical compositions can be used in conjunction with conventional 

30 methods of cancer treatment, e.g., to sensitize tumors to radiation or conventional chemotherapy. 

Where the pharmaceutical composition comprises a receptor (such as an antibody) that 
specifically binds to a gene product encoded by a differentially expressed gene, the receptor can be 
coupled to a drug for delivery to a treatment site or coupled to a detectable label to facilitate imaging 
of a site comprising colon cancer cells. Methods for coupling antibodies to drugs and detectable 

3 5 labels are well known in the art, as are methods for imaging using detectable labels. 
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The term "therapeutically effective amount" as used herein refers to an amount of a 
therapeutic agent to treat, ameliorate, or prevent a desired disease or condition, or to exhibit a 
detectable therapeutic or preventative effect. The effect can be detected by, for example, chemical 
markers or antigen levels. Therapeutic effects also include reduction in physical symptoms, such as 
5 decreased body temperature. 

The precise effective amount for a subject will depend upon the subject's size and health, the 
nature and extent of the condition, and the therapeutics or combination of therapeutics selected for 
administration. Thus, it is not useful to specify an exact effective amount in advance. However, the 
effective amount for a given situation is determined by routine experimentation and is within the 

10 judgment of the clinician. For purposes of the present invention, an effective dose will generally be 
from about 0.01 mg/kg to 50 mg/kg or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the 
individual to which it is administered. 

A pharmaceutical composition can also contain a pharmaceutically acceptable carrier. The 
term "pharmaceutically acceptable carrier" refers to a carrier for administration of a therapeutic agent, 

15 such as antibodies or a polypeptide, genes, and other therapeutic agents. The term refers to any 
pharmaceutical carrier that does not itself induce the production of antibodies harmful to the 
individual receiving the composition, and which can be administered without undue toxicity. Suitable 
carriers can be large, slowly metabolized macromolecules such as proteins, polysaccharides, polylactic 
acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, and inactive virus particles. 

20 Such carriers are well known to those of ordinary skill in the art. Pharmaceutically acceptable carriers 
in therapeutic compositions can include liquids such as water, saline, glycerol and ethanol. Auxiliary 
substances, such as wetting or emulsifying agents, pH buffering substances, and the like, can also be 
present in such vehicles. 

Typically, the therapeutic compositions are prepared as injectables, either as liquid solutions 

25 or suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection 
can also be prepared. Liposomes are included within the definition of a pharmaceutically acceptable 
carrier. Pharmaceutically acceptable salts can also be present in the pharmaceutical composition, e.g., 
mineral acid salts such as hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the 
salts of organic acids such as acetates, propionates, malonates, benzoates, and the like. A thorough 

30 discussion of pharmaceutically acceptable excipients is available in Remington's Pharmaceutical 
Sciences (Mack Pub. Co., NJ. 1991). 
Delivery Methods 

Once formulated, the compositions of the invention can be (1) administered directly to the 
subject (e.g., as polynucleotide or polypeptides); or (2) delivered ex vivo, to cells derived from the 
3 5 subject (e.g., as in ex vivo gene therapy). Direct delivery of the compositions will generally be 
accomplished by parenteral injection, e.g., subcutaneously, intraperitoneally, intravenously or 
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intramuscularly, iutratumorally or to the interstitial space of a tissue. Other modes of administration 
include oral and pulmonary administration, suppositories, and transdermal applications, needles, and 
gene guns or hyposprays. Dosage treatment can be a single dose schedule or a multiple dose schedule. 
Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are 
5 known in the art and described in, e.g., WO 93/14778. Examples of cells useful in ex vivo 

applications include, for example, stem cells, particularly hematopoetic, lymph cells, macrophages, 
dendritic cells, or tumor cells. Generally, delivery of nucleic acids for both ex vivo and in vitro 
applications can be accomplished by, for example, dextran-mediated transfection, calcium phosphate 
precipitation, polybrene mediated transfection, protoplast fusion, electroporation, encapsulation of the 
1 0 polynucleotide^) in liposomes, and direct microinjection of the DNA into nuclei, all well known in 
the art. 

Once differential expression of a gene corresponding to a polynucleotide of the invention has 
been found to correlate with a proliferative disorder, such as neoplasia, dysplasia, and hyperplasia, the 
disorder can be amenable to treatment by administration of a therapeutic agent based on the provided 

15 polynucleotide, corresponding polypeptide or other corresponding molecule (e.g., antisense, ribozyme, 
etc.). In other embodiments, the disorder can be amenable to treatment by administration of a small 
molecule drug that, for example, serves as an inhibitor (antagonist) of the function of the encoded 
gene product of a gene having increased expression in cancerous cells relative to normal cells or as an 
agonist for gene products that are decreased in expression in cancerous cells (e.g., to promote the 

20 activity of gene products that act as tumor suppressors). 

The dose and die means of administration of the inventive pharmaceutical compositions are 
determined based on the specific qualities of the therapeutic composition, the condition, age, and 
weight of the patient, die progression of the disease, and other relevant factors. For example, 
administration of polynucleotide therapeutic composition agents of the invention includes local or 

25 systemic administration, including injection, oral administration, particle gun or cadieterized 

administration, and topical administration. Preferably, the therapeutic polynucleotide composition 
contains an expression construct comprising a promoter operably linked to a polynucleotide of at least 
12, 22, 25, 30, or 35 contiguous nt of the polynucleotide of the invention. Various methods can be 
used to administer the therapeutic composition directly to a specific site in the body. For example, a 

30 small metastatic lesion is located and the therapeutic composition injected several times in several 
different locations within the body of tumor. Alternatively, arteries that serve a tumor are identified, 
and the therapeutic composition injected into such an artery, in order to deliver the composition 
directly into the tumor. A tumor that has a necrotic center is aspirated and the composition injected 
directly into die now empty center of the tumor. The antisense composition is directly administered to 

3 5 die surface of die tumor, for example, by topical application of the composition. X-ray imaging is 
used to assist in certain of die above delivery methods. 



55 



WO 2004/039943 



PCT/LS2003/015465 



Targeted delivery of therapeutic compositions containing an antisense polynucleotide, 
subgenomic polynucleotides, or antibodies to specific tissues can also be used. Receptor-mediated 
DNA delivery techniques are described in, for example, Findeis et al., Trends Biotechnol. (1993) 
1 1 :202; Chiou et al., Gene Therapeutics: Methods And Applications Of Direct Gene Transfer (J.A. 
5 Wolff, ed.) (1994); Wu et al., J. Biol. Chem. (1988) 263:621; Wu et al., J. Biol. Chem. (1994) 

269:542; Zenke et al, Proc. Natl. Acad. Sci. (USA) (1990) 87:3655; Wu et al., J. Biol. Chem. (1991) 
266:33 8. Therapeutic compositions containing a polynucleotide are administered in a range of about 
100 ng to about 200 mg of DNA for local administration in a gene therapy protocol. Concentration 
ranges of about 500 ng to about 50 mg, about 1 micrograms to about 2 mg, about 5 micrograms to 
10 about 500 micrograms, and about 20 micrograms to about 100 micrograms of DNA can also be used 
during a gene therapy protocol. Factors such as method of action (e.g., for enhancing or inhibiting 
levels of the encoded gene product) and efficacy of transformation and expression are considerations 
which will affect the dosage required for ultimate efficacy of the antisense subgenomic 
polynucleotides. 

15 Where greater expression is desired over a larger area of tissue, larger amounts of antisense 

subgenomic polynucleotides or the same amounts readministered in a successive protocol of 
administrations, or several administrations to different adjacent or close tissue portions of, for 
example, a tumor site, may be required to effect a positive therapeutic outcome. In all cases, routine 
experimentation in clinical trials will determine specific ranges for optimal therapeutic effect. For 

20 polynucleotide related genes encoding polypeptides or proteins with anti-inflammatory activity, 
suitable use, doses, and administration are described in USPN 5,654,173. 

The therapeutic polynucleotides and polypeptides of the present invention can be delivered 
using gene delivery vehicles. The gene delivery vehicle can be of viral or non-viral origin (see 
generally, Jolly, Cancer Gene Therapy (1994) 1:51; Kimura, Human Gene Therapy (1994) 5:845; 

25 Connelly, Human Gene Therapy (1995) 1:185; and Kaplitt, Nature Genetics (1994) 6:148). 

Expression of such coding sequences can be induced using endogenous mammalian or heterologous 
promoters. Expression of the coding sequence can be either constitutive or regulated. 

Viral-based vectors for delivery of a desired polynucleotide and expression in a desired cell 
are well known in the art. Exemplary viral-based vehicles include, but are not limited to, recombinant 

30 retroviruses (see, e.g., WO 90/07936; WO 94/03622; WO 93/25698; WO 93/25234; USPN 5, 

219,740; WO 93/11230; WO 93/10218; USPN 4,777,127; GB Patent No. 2,200,651; EP 0 345 242; 
and WO 91/02805), alphavirus-based vectors (e.g., Sindbis virus vectors, Semliki forest virus (ATCC 
VR-67; ATCC VR-1247), Ross River virus (ATCC VR-373; ATCC VR-1246) and Venezuelan 
equine encephalitis virus (ATCC VR-923; ATCC VR-1250; ATCC VR 1249; ATCC VR-532), and 

35 adeno-associated virus (AAV) vectors (see, e.g., WO 94/12649, WO 93/03769; WO 93/19191; WO , 
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94/28938; WO 95/1 1984 and WO 95/00655). Administration of DNA linked to killed adenovirus, as 

described in Curiel, Hum. Gene Ther. (1992) 3:147, can also be employed. 

Non-viral delivery vehicles and methods can also be employed, including, but not limited to, 

polycationic condensed DNA linked or unlinked to killed adenovirus alone (see, e.g., Curiel, Hum. 
5 Gene Ther. (1992) 3:147); ligand-linked DNA (see, e.g., Wu, J. Biol. Chem. (1989) 264:16985); 

eukaryotic cell delivery vehicles cells (see, e.g., USPN 5,814,482; WO 95/07994; WO 96/17072; 

WO 95/30763; and WO 97/42338) and nucleic charge neutralization or fusion with cell membranes. 

Naked DNA can also be employed. Exemplary naked DNA introduction methods are described in 

WO 90/1 1092 and USPN 5,580,859. Liposomes that can act as gene delivery vehicles are described 
10 in USPN 5,422, 120; WO 95/13796; WO 94/23697; WO 91/14445; and EP 0524968. Additional 

approaches are described in Philip, Mol. Cell Biol. (1994) 14:241 1, and in Woffendin, Proc. Natl. 

Acad. Sci. (1994) 91:1581 1 

Further non-viral delivery suitable for use includes mechanical delivery systems such as the 

approach described in Woffendin et al., Proc. Natl. Acad. Sci. USA (1994) 91(24): 1 1581. Moreover, 
15 the coding sequence and the product of expression of such can be delivered through deposition of 

photopolymerized hydrogel materials or use of ionizing radiation (see, e.g., USPN 5,206,152 and WO 

92/1 1033). Other conventional methods for gene delivery that can be used for delivery of the coding 

sequence include, for example, use of hand-held gene transfer particle gun (see, e.g., USPN 

5, 149,655); use of ionizing radiation for activating transferred gene (see, e.g., USPN 5,206,1 52 and 
20 WO 92/11033). 

The present invention will now be illustrated by reference to the following examples which 
set forth particularly advantageous embodiments. However, it should be noted that these 
embodiments are illustrative and are not to be construed as restricting the invention in any way. 

25 EXAMPLES 

The following examples are put forth so as to provide those of ordinary skill in the art with a 
complete disclosure and description of how to make and use the present invention, and are not 
intended to limit the scope of what the inventors regard as their invention nor are they intended to 
represent that the experiments below are all or the only experiments performed. It will be readily 

30 apparent to those skilled in the art that the formulations, dosages, methods of administration, and other 
parameters of this invention may be further modified or substituted in various ways without departing 
from the spirit and scope of the invention. Efforts have been made to ensure accuracy with respect to 
numbers used (e.g. amounts, temperature, etc.) but some experimental errors and deviations should be 
accounted for. Unless indicated otherwise, parts are parts by weight, molecular weight is weight 

3 5 average molecular weight, temperature is in degrees Centigrade, and pressure is at or near 
atmospheric. 
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Example 1: Source of Biological Materials and Overview of Novel Polynucleotides Expressed 
by the Biological Materials 

Candidate polynucleotides that may represent novel polynucleotides were obtained from - 
cDNA libraries generated from selected cell lines and patient tissues. In order to obtain the candidate 
5 polynucleotides, mRNA was isolated from several selected cell lines and patient tissues, and used to 
construct cDNA libraries. Hie cells and tissues that served as sources for these cDNA libraries are 
summarized in Table 1 below. 

Human colon cancer cell HneKml2L4-A (Morikawa, et al., Cancer Research (1988) 
48:6863) is derived from the KM12C cell line. The KM12C cell line (Morikawa et al. Cancer Res. 

10 (1988) 48:1943-1948), which is poorly metastatic (low metastatic) was established in culture from a 
Dukes' stage B2 surgical specimen (Morikawa et al. Cancer Res. (1988) 48:6863). The KM12L4-A 
is a highly metastatic subline derived from KM12C (Yeatman et al. Nucl. Acids. Res. (1995) 23:4007; 
Bao-Ling et al. Proc. Annu. Meet. Am. Assoc. Cancer. Res. (1995) 21:3269). The KM12C and . 
KM12C-derived cell lines (e.g., KM12L4, KM12L4-A, etc.) are well-recognized in the art as a model 

15 cell line for the study of colon cancer (see, e.g., Moriakawa et al., supra; Radinsky et al. Clin. Cancer 
Res. (1995) 1:19; Yeatman et al., (1995) supra; Yeatman et al. Clin. Exp. Metastasis (1996) 14:246). 

The MDA-MB-23 1 cell line (Brinkley et al. Cancer Res. (1 980) 40:3 1 1 8-3 129) was originally 
isolated from pleural effusions (Cailleau, J. Natl. Cancer. Inst. (1974) 53:661), is of high metastatic 
potential, and forms poorly differentiated adenocarcinoma grade II in nude mice consistent with breast 

20 carcinoma. The MCF7 cell line was derived from a pleural effusion of a breast adenocarcinoma and is 
non-metastatic. The MV-522 cell line is derived from a human lung carcinoma and is of high 
metastatic potential. The UCP-3 cell line is a low metastatic human lung carcinoma cell line; the MV- 
522 is a high metastatic variant of UCP-3. These cell lines are well-recognized in the art as models for 
the study of human breast and lung cancer (see, e.g., Chandrasekaran et al., Cancer Res. (1979) 

25 39:870 (MDA-MB-231 andMCF-7); Gastpar etal., JMedChem(1998) 41:4965 (MDA-MB-231 and 
MCF-7); Ranson et al, Br J Cancer (1998) 77:1586 (MDA-MB-231 and MCF-7); Kuang et al., 
Nucleic Acids Res (1 998) 26: 1 1 1 6 (MDA-MB-23 1 and MCF-7); Varki et al, Int J Cancer (1987) 
40:46 (UCP-3); Varki et al. Tumour Biol. (1990) 11:327; (MV-522 and UCP-3); Varki et al. 
Anticancer Res. (1990) 10:637; (MV-522); Kelner et al. Anticancer Res (1995) 15:867 (MV-522); 

30 and Zhang et al. Anticancer Drugs (1997) 8:696 (MV522)). 

Hie samples of libraries 15-20 are derived from two different patients (UC#2, and UC#3). 
The bFGF-treated HMVEC were prepared by incubation with bFGF at lOng/ml for 2 Ins; the VEGF- 
treated HMVEC were prepared by incubation with 20ng/ml VEGF for 2 hrs. Following incubation 
with the respective growth factor, the cells were washed and lysis buffer added for RNA preparation. 

35 GRRpz was derived from normal prostate epithelium. The WOca cell line is a Gleason Grade 

4 cell line. 
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Hie source materials for generating the normalized prostate libraries of libraries 25 and 26 
were cryopreserved prostate tumor tissue from a patient with Gleason grade 3+3 adenocarcinoma and 
matched normal prostate biopsies from a pool of at-risk subjects under medical surveillance. The 
source materials for generating the normalized prostate libraries of libraries 30 and 3 1 were 
5 cryopreserved prostate tumor tissue from a patient with Gleason grade 4+4 adenocarcinoma and 
matched normal prostate biopsies from a pool of at-risk subjects under medical surveillance. 

The source materials for generating the normalized breast libraries of libraries 27, 2 8 and 29 
were cryopreserved breast tissue from a primary breast tumor (infiltrating ductal 
carcinoma)(library 28), from a lymph node metastasis (library 29), or matched normal breast biopsies 
1 0 from a pool of at-risk subjects under medical surveillance, hi each case, prostate or breast epithelia 
were harvested directly from frozen sections of tissue by laser capture microdissection (LCM, 
Arcturus Enginering Inc., Mountain View, CA), carried out according to methods well known in the 
art (see, Simone et al. Am J Pathol. 156(2):445-52 (2000)), to provide substantially homogenous cell 
samples. 

15 Table 1. Description of cDNA Libraries 



(lib#) ^ 


Description 


Number 
of Clones 


0 


Artificial library composed of deselected clones (clones with no 
associated variant or cluster) 


673 


1 


Human Colon Cell Line Kml2 L4: High Metastatic Potential 
(derived from Kml2C) 


308731 


2 


Human Colon Cell Line Kml2C: Low Metastatic Potential 


284771 


3 


Human Breast Cancer Cell Line MDA-MB-23 1 : High Metastatic 
Potential; micro-mets in lung 


326937 


4 


Human Breast Cancer Cell Line MCF7: Non Metastatic 


318979 


8 


Human Lung Cancer Cell Line MV-522: High Metastatic Potential 


223620 


9 


Human Lung Cancer Cell Line UCP-3: Low Metastatic Potential 


312503 


12 


Human microvascular endothelial cells (HMEC) - UNTREATED 
(PCR (OligodT) cDNA library) 


41938 


13 


Human microvascular endothelial cells (HMEC) - bFGF TREATED 
(PCR (OligodT) cDNA library) 


42100 


14 


Human microvascular endothelial cells (HMEC) - VEGF TREATED 
(PCR (OligodT) cDNA library) 


42825 


15 


Normal Colon - UC#2 Patient (MICRODISSECTED PCR (OligodT) 
cDNA library) 


282722 


16 


Colon Tumor - UC#2 Patient (MICRODISSECTED PCR (OligodT) 
cDNA library) 


298831 


17 


Liver Metastasis from Colon Tumor of UC#2 Patient 
(MICRODISSECTED PCR (OligodT) cDNA library) 


303467 


18 


Normal Colon - UC#3 Patient (MICRODISSECTED PCR (OligodT) 
cDNA library) 


36216 


19 


Colon Tumor - UC#3 Patient (MICRODISSECTED PCR (OligodT) 
cDNA library) 


41388 


20 


Liver Metastasis from Colon Tumor of UC#3 Patient 
(MICRODISSECTED PCR (OligodT) cDNA library) 


30956 
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Library 
(lib#) 


Description 


Number 
of Clones 
in Library 


21 


GRRpz Cells derived from normal prostate epithelium 


164801 


22 


WOca Cells derived from Gleason Grade 4 prostate cancer 
epithelium 


162088 


23 


Normal Lung Epithelium of Patient #1006 (MICRODISSECTED 
PCR (OligodT) cDNA library) 


306198 


24 


Primary tumor, Large Cell Carcinoma of Patient #1006 
(MICRODISSECTED PCR (OligodT) cDNA library) 


309349 


25 


Normal Prostate Epithelium from Patient IF97-2681 1 


279444 


26 


Prostate Cancer Epithelium Gleason 3+3 Patient IF97-268 1 1 


269406 


27 


Normal Breast Epithelium from Patient 515 


239494 


28 


Primary Breast tumor from Patient 515 


259960 


29 


Lymph node metastasis from Patient 515 


326786 


30 


Normal Prostate Epithelium from Chiron Patient ED 884 


298431 


31 


Prostate Cancer Epithelium (Gleason 4+4) from Chiron Patient ID 
884 


331941 



Characterization of sequences in the libraries 

After using the software program Phred (ver 0.000925 .c, Green and Weing„ ©1993-2000) to 
select diose polynucleotides having the best quality sequence, the polynucleotides were compared 
5 against die public databases to identify any homologous sequences. The sequences of the isolated 
polynucleotides were first masked to eliminate low complexity sequences using the RcpeatMasker 
masking program, publicly available through a web site supported by the University of Washington 
{See also Smit, A.F.A. and Green, P., unpublished results). Generally, masking does not influence the 
final search results, except to eliminate sequences of relatively little interest due to their low 

10 complexity, and to eliminate multiple "hits" based on similarity to repetitive regions common to 
multiple sequences, e.g., Alu repeats. 

The remaining sequences were then used in a homology search of die GenBank database 
using the TeraBLAST program (TimeLogic, Crystal Bay, Nevada). TeraBLAST is a version of the 
publicly available BLAST search algorithm developed by the National Center for Biotechnology, 

15 modified to operate at an accelerated speed with increased sensitivity on a specialized computer 

hardware platform. The program was run with the default parameters recommended by TimeLogic to 
provide die best sensitivity and speed for searching DNA and protein sequences. Sequences tiiat 
exhibited greater than 70% overlap, 99% identity, and a p value of less than 1 x 10e-40 were 
discarded. Sequences from this search also were discarded if the inclusive parameters were met, but 

20 the sequence was ribosomal or vector-derived. 

The resulting sequences from the previous search were classified into three groups (1, 2 and 3 
below) and searched in a TeraBLASTX vs. NRP (non-redundant proteins) database search: (1) 
unknown (no hits in the GenBank search), (2) weak similarity (greater than 45% identity and p value 
of less than 1 x 10e-5), and (3) high similarity (greater than 60% overlap, greater than 80% identity, 
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and p value less than 1 x 10e-5). Sequences having greater than 70% overlap, greater than 99% 
identity, and p value of less than 1 x 10e-40 were discarded. 

The remaining sequences were classified as unknown (no hits), weak similarity, and high 
similarity (parameters as above). Two searches were performed on these sequences. First, a 
5 TeraBLAST vs. EST database search was performed and sequences with greater than 99% overlap, 
greater than 99% similarity and a p value of less than 1 x 1 Oe-40 were discarded. Sequences with a p 
value of less than 1 x 10e-65 when compared to a database sequence of human origin were also 
excluded. Second, a TeraBLASTN vs. Patent GeneSeq database was performed and sequences 
having greater than 99% identity, p value less than 1 x 1 Oe-40, and greater than 99% overlap were 
10 discarded. 

The remaining sequences were subjected to screening using other rules and redundancies in 
the dataset. Sequences with a p value of less than 1 x 10e-l 1 1 in relation to a database sequence of 
human origin were specifically excluded. The final result provided the sequences listed as SEQ ID 
NOS.T-1219 in the accompanying Sequence Listing and summarized in Table 2 (inserted prior to 
15 claims). Each identified polynucleotide represents sequence from at least a partial mRNA transcript. 

Summary of polynucleotides of the invention 

Table 2 (inserted prior to claims) provides a summary of polynucleotides isolated as 
described. Specifically, Table 2 provides: 1) the SEQ ID NO ("SEQ ID") assigned to each sequence 
for use in the present specification; 2) theCluster Identification No. ("CLUSTER"); 3) the Sequence 

20 Name assigned to each sequence; 3) the sequence name ("SEQ NAME") used as an internal identifier 
of the sequence; 4) the name assigned to the clone from which the sequence was isolated ("CLONE 
ID"); and 5) the name of the library from which the sequence was isolated ("LIBRARY"). Because at 
least some of the provided polynucleotides represent partial mRNA transcripts, two or more 
polynucleotides may represent different regions of the same mRNA transcript and the same gene 

25 and/or may be contained within the same clone. Thus, for example, if two or more SEQ ID NOS: are 
identified as belonging to the same clone, then either sequence can be used to obtain the full-length 
mRNA or gene. Clones which comprise the sequences described herein were deposited as set out in 
the tables indicated below (see Example entitled 'Deposit Information"). 
Example 2: Contig Assembly 

30 The sequences of the polynucleotides provided in the present invention can be used to extend 

the sequence information of the gene to which the polynucleotides correspond (e.g., a gene, or mRNA 
encoded by the gene, having a sequence of the polynucleotide described herein). This expanded 
sequence information can in turn be used to further characterize the corresponding gene, which in turn 
provides additional information about the nature of the gene product (e.g., the normal function of the 

35 gene product). The additional information can serve to provide additional evidence of the gene 
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product's use as a therapeutic target, and provide further guidance as to the types of agents that can 
modulate its activity. 

For example, a contig was assembled using the sequence of a polynucleotide described herein. 
A "contig" is a contiguous sequence of nucleotides that is assembled from nucleic acid sequences 
5 having overlapping (e.g. , shared or substantially similar) sequence information. The sequences of 
publicly-available ESTs (Expressed Sequence Tags) and the sequences of various of the above- 
described polynucleotides were used in the contig assembly. The contig was assembled using the 
software program Sequencher, version 4.05, according to the manufacturer's instructions. The 
sequence information obtained in the contig assembly was then used to obtain a consensus sequence 
1 0 derived from the contig using the Sequencher program. The resulting consensus sequence was used to 
search both the public databases as well as databases internal to the applicants to match the consensus 
polynucleotide with homology data and/or differential gene expressed data. 

The Final result provided the sequences listed as SEQ ED NOS: 1220-1428 in the 
accompanying Sequence Listing and summarized in Tables 3 and 4 (inserted prior to claims). Table 3 
15 provides a summary of the consensus sequences assembled as described. Specifically, Table 3 

provides: 1) the SEQ ID NO ("SEQ ID") assigned to each consensus sequence for use in the present 
specification; 2) theCluster Identification No. ("CLUSTER"); and 3) the consensus sequence name 
("CONSENSUS SEQ NAME") used as an internal identifier of the sequence. 

A correlation between the polynucleotide used in consensus sequence assembly as described 
20 above and the corresponding consensus sequence is contained in Table 4. Specifically Table 4 

provides: 1) the SEQ ID NO of the consensus sequence ("CONSENSUS SEQ ID 55 ); 2) the consensus 
sequence name ("CONSENSUS SEQ NAME") used as an internal identifier of the sequence; 3) the 
SEQ ID NO of the polynucleotide ("POLYNTD SEQ ID") of SEQ ID NOS: 1-1219 used in assembly 
of the consensus sequence; and 4) the sequence name ("POLYNTD SEQ NAME") of the 
25 polynucleotide of SEQ ID NOS: 1-1219 used in assembly of die consensus sequence. 
Example 3 : Additional Gene Characterization 

Sequences of the polynucleotides of SEQ ID NOS: 1-1219 were used as a query sequence in a 
TeraBLASTN search of the DoubleTwist Human Genome Sequence Database (DoubleTwist, Inc., 
Oakland, CA), which contains all the human genomic sequences that have been assembled into a 

30 contiguous model of the human genome. Predicted cDNA and protein sequences were obtained 
where a polynucleotide of the invention was homologous to a predicted full-length gene sequence. 
Alternatively, a sequence of a contig or consensus sequence described herein could be used directly as 
a query sequence in a TeraBLASTN search of the DoubleTwist Human Genome Sequence Database. 
The final results of the search provided the predicted cDNA sequences listed as SEQ ID NOS: 

35 1429-1485 in the accompanying Sequence Listing and summarized in Table 5 (inserted prior to 

claims), and the predicted protein sequences listed as SEQ ID NOS:1486-1542 in the accompanying 
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Sequence Listing and summarized in Table 6 (inserted prior to claims). Specifically, Table 5 
provides: 1) the SEQ ED NO ("SEQ ID") assigned to each cDNA sequence for use in the present 
specification; 2) the cDNA sequence name ("cDNA SEQ NAME") used as an internal identifier of the 
sequence; 3) the chromosome ("CHROM") containing the gene corresponding to the cDNA sequence; 
5 and 4) the exon ("EXON") of the gene corresponding to the cDNA sequence to which the 

polynucleotide of SEQ ID NOS: 1-1219 maps. Table 6 provides: 1) the SEQ ID NO ("SEQ ID") 
assigned to each protein sequence for use in the present specification; 2) the protein sequence name 
("PROTEIN SEQ NAME") used as an internal identifier of the sequence; 3) the chromosome 
("CHROM") containing the gene corresponding to the cDNA sequence; and 4) the exon ("EXON") of 
1 0 the gene corresponding to the cDNA and protein sequence to which the polynucleotide of SEQ ID 
NOS: 1-1219 maps. 

A correlation between the polynucleotide used as a query sequence as described above and the 
corresponding predicted cDNA and protein sequences is contained in Table 7. Specifically Table 7 
provides: 1) the SEQ ID NO of the cDNA ("cDNA SEQ ID"); 2) the cDNA sequence name ("cDNA 

1 5 SEQ NAME") used as an internal identifier of the sequence; 3) the SEQ ID NO of the protein 
("PROTEIN SEQ ID") encoded by the cDNA sequence 4) the sequence name of the protein 
("PROTEIN SEQ NAME") encoded by the cDNA sequence; 5) the SEQ ID NO of the polynucleotide 
("POLYNTD SEQ ID") of SEQ ID NOS: 1-1219 that maps to the cDNA and protein; and 6) the 
sequence name ("POLYNTD SEQ NAME") of the polynucleotide of SEQ ID NOS: 1-1219 that maps 

20 to the cDNA and protein. 

Through contig and consensus sequence assembly and the use of homology searching 
software programs, the sequence information provided herein can be readily extended to confirm, or 
confirm a predicted, gene having the sequence of the polynucleotides described in the present 
invention. Further the information obtained can be used to identify the function of the gene product of 

25 the gene corresponding to the polynucleotides described herein. While not necessary to the practice of 
the invention, identification of the function of the corresponding gene, can provide guidance in the 
design of therapeutics that target the gene to modulate its activity and modulate the cancerous 
phenotype {e.g., inhibit metastasis, proliferation, and the like). 

Example 4:Results of Public Database Search to Identify Function of Gene Products 

30 SEQ ED NOS: 1-1485 were translated in all three reading frames, and the nucleotide sequences 

and translated amino acid sequences used as query sequences to search for homologous sequences in 
the GenBank (nucleotide sequences) database. Query and individual sequences were aligned using the 
TeraBLAST program available from TimeLogic, Crystal Bay, Nevada. The sequences were masked 
to various extents to prevent searching of repetitive sequences or poly-A sequences, using the 

3 5 RepeatMasker masking program for masking low complexity as described above. 
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Table 8 (inserted prior to claims) provides the alignment summaries having a p value of 1 x 
10e-2 or less indicating substantial homology between the sequences of the present invention and 
those of the indicated public databases. Specifically, Table 8 provides: 1) the SEQ ID NO ("SEQ 
ID") of the query sequence; 2) the sequence name ("SEQ NAME") used as an internal identifier of the 

5 query sequence; 3) the accession number ("ACCESSION") of the GenBank database entry of the 
homologous sequence; 4) a description of the GenBank sequences ("GENBANK DESCRIPTION"); 
and 5) the score of the similarity of the polynucleotide sequence and the GenBank sequence 
("GENBANK SCORE"). The alignments provided in Table 8 are the best available alignment to a 
DNA sequence at a time just prior to filing of the present specification. Incorporated by reference is 

10 all publicly available information regarding the sequence listed in Table 8 and their related sequences. 
The search program and database used for the alignment, as well as the calculation of the p value are 
also indicated. Full length sequences or fragments of the polynucleotide sequences can be used as 
probes and primers to identify and isolate the full length sequence of the corresponding 
polynucleotide. 

15 Example 5:Members of Protein Families 

SEQ ID NOS: 1-1219 were used to conduct a profile search as described in the specification 
above. Several of the polynucleotides of the invention were found to encode polypeptides having 
characteristics of a polypeptide belonging to a known protein family (and thus represent members of 
these protein families) and/or comprising a known functional domain. Table 9 (inserted prior to 

20 claims) provides: 1) the SEQ ID NO ("SEQ ID") of the query polynucleotide sequence; 2) the 
sequence name ("SEQ NAME") used as an internal identifier of the query sequence; 3) the name 
("PFAM NAME") of die profile hit; 4) a brief description of the profile hit ("PFAM 
DESCRIPTION"); 5) the score ("SCORE") of the profile hit; 6) the starting nucleotide of the profile 
hit ("START"); and 7) the ending nucleotide of the profile hit ("END"). 

25 In addition, SEQ ID NOS:1486-1542 were also used to conduct a profile search as described 

above. Several of the polypeptides of the invention were found to have characteristics of a 
polypeptide belonging to a known protein femily (and thus represent members of these protein 
families) and/or comprising a known functional domain. Table 10 (inserted prior to claims) provides: 
1) the SEQ ID NO ("SEQ ID") of the query protein sequence; 2) the sequence name ("PROTEIN 

30 SEQ NAME") used as an internal identifier of the query sequence; 3) the name ("PFAM NAME") of 
the profile hit; 4) a brief description of the profile hit ("PFAM DESCRIPTION"); 5) the score 
("SCORE") of the profile hit; 6) the starting residue of the profile hit ("START"); and 7) the ending 
residue of the profile hit ("END"). 

Some SEQ ID NOS exhibited multiple profile hits where the query sequence contains 

35 overlapping profile regions, and/or where the sequence contains two different functional domains. 
Each of the profile hits of Tables 9 and 10 is described in more detail below. The acronyms for the 
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profiles (provided in parentheses) are those used to identify the profile in the Pfam, Prosite, and 
InterPro databases. The Pfam database can be accessed through web sites supported by Genome 
Sequencing Center at the Washington University School of Medicine or by the European Molecular 
Biology Laboratories in Heidelberg, Germany. The Prosite database can be accessed at the ExPASy 
5 Molecular Biology Server on the internet. The InterPro database can be accessed at a web site 

supported by the EMBL European Bioinformatics Institute. The public information available on the 
Pfam, Prosite, and InterPro databases regarding the various profiles, including but not limited to the 
activities, function, and consensus sequences of various proteins families and protein domains, is 
incorporated herein by reference. 

10 Epidermal Growth Factor (EGF; Pfam Accession No. PF00008). SEQ ID NOS:417 and 41 8 

represent polynucleotides encoding a member of the EGF family of proteins. The distinguishing 
characteristic of tiiis family is the presence of a sequence of about thirty to forty amino acid residues 
found in epidermal growth factor (EGF) which has been shown to be present, in a more or less 
conserved form, in a large number of other proteins (Davis, New Biol. (1990) 2:410-419; Blomquist et 

15 a!., Proc. Natl. Acad. Sci. U.S.A. (1984) 57:7363-7367; Barkert et al, Protein Nucl Acid Enz. (1986) 
2P-.54-86; Doolittle et al, Nature. (1984)507:558-560; Appella^a/., FEBS Lett. (1988)237:1-4; 
Campbell and Bork, Curr. Opin. Struct. Biol. (1993) 5:385-392). A common feature of the domain is 
that the conserved pattern is generally found in the extracellular domain of membrane-bound proteins 
or in proteins known to be secreted. The EGF domain includes six cysteine residues which have been 

20 shown to be involved in disulfide bonds. The main structure is a two-stranded beta-sheet followed by 
a loop to a C-terminal short two-stranded sheet. Subdomains between the conserved cystemes 
strongly vary in length. These consensus patterns are used to identify members of this family: C-x-C- 
x(5)-G-x(2)-C and C-x-C-x(s)-[GP]-[FYW]-x(4,8)-C. 

Seven Transmembrane Integral Membrane Proteins - Rhodopsin Family (7tm_l : Pfam 

25 Accession No. PFOOOOU SEQ ID NO:321 corresponds to a sequence encoding a polypeptide that is 
a member of the seven transmembrane (7tm) receptor rhodopsin family. G-protein coupled receptors 
of the (7tm) rhodopsin family (also called R7G) are an extensive group of hormones, 
neurotransmitters, and light receptors which transduce extracellular signals by interaction with 
guanine nucleotide-binding (G) proteins (Strosberg, Eur. J. Biochem. (1991) 196:1; Kerlavage, Curr. 

30 Opin. Struct. Biol. (1991) 1:394; Probst etsl.,DNA CellBiol. (1992) 11:1; Savarese et al., Biochem. 
J. (1992)283:1. The consensus pattern that contains the conserved triplet and that also spans the 
major part of the third transmembrane helix is used to detect this widespread family of proteins: 
[GSTALIVMFWC]-[GSTANCPDEKEDPKRH}-x(2)-[L^ 
[GSTANC]-[LIVMFYWSTAC]-[PENH]-R^ 

35 Basic Region Plus Leucine Zipper Transcription Factors (bZIP: Pfam Accession 

No. PF00170\ SEQ ED NO:638 represents a polynucleotide encoding a novel member of the family 
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of basic region plus leucine zipper transcription factors. The bZIP superfamily (Hurst, Protein Prof. 
(1995) 2:105; and Ellenberger, Curr. Optn. Struct. Biol. (1994) 4:12) of eukaryotic DNA-binding 
transcription factors encompasses proteins that contain a basic region mediating sequence-specific 
DNA-binding followed by a leucine zipper required for dimerization. The consensus pattern for this 
5 protein family is: P^]-x(l 5 3)-[RI<:SAQ]-N-x(2)-[SAQ](2)-x-[RKTAENQ]-x-R-x-[RK:]. 

Reverse Transcriptase (rvt: Pfam Accession No. PF00078\ SEQ ID NO: 137 represents a 
polynucleotide encoding a reverse transcriptase, which occurs in a variety of mobile elements, 
including retrotransposons, retroviruses, group II nitrons, bacterial msDNAs, hepadnaviruses, and 
caulimoviruses (Xiong and Eickbush, EMBO J (1990) P:3353-3362). Reverse transcriptases catalyze 

10 RNA-template-directed extension of the 3 '-end of a DNA strand by one deoxynucleotide at a time and 
require an RNA or DNA primer. 

KRAB box (KRAB: Pfam Accession No. PF01352I SEQ ID NO:1012 represents a 
polypeptide having a Krueppel-associated box (KRAB). A KRAB box is a domain of around 75 
amino acids that is found in the N-terminal part of about one third of eukaryotic Krueppel-type C2H2 

15 zinc finger proteins (ZFPs). It is enriched in charged amino acids and can be divided into subregions 
A and B, which are predicted to fold into two amphipathic alpha-helices. The KRAB A and B boxes 
can be separated by variable spacer segments and many KRAB proteins contain only the A box. 

The KRAB domain functions as a transcriptional repressor when tethered to the template 
DNA by a DNA-binding domain. A sequence of 45 amino acids in the KRAB A subdomain lias been 

20 shown to be necessary and sufficient for transcriptional repression. The B box does not repress by 
itself but does potentiate the repression exerted by the KRAB A subdomain. Gene silencing requires 
the binding of the KRAB domain to the RING-B box-coiled coil (RBCC) domain of the KAP-1/TIF1- 
beta corepressor. As KAP-1 binds to the heterochromatin proteins HP 1, it has been proposed that the 
KRAB-ZFP-bound target gene could be silenced following recruitment to heterochromatin. 

25 KRAB-ZFPs constitute one of the single largest class of transcription factors within the 

human genome, and appear to play important roles during cell differentiation and development. The 
KRAB domain is generally encoded by two exons. The regions coded by the two exons are known as 
KRAB- A and KRAB-B. 

Armadillo/beta-catenin-like repeat (Armadillo see; Pfam Accession No. PF00514). SEQ ID 

30 NO: 1486 represents a polypeptide having sequence similarity with the armadillo/beta-catenin-like 
repeat (armadillo). The armadillo repeat is an approximately 40 amino acid long tandemly repeated 
sequence motif first identified in the Drosophila segment polarity gene armadillo. Similar repeats were 
later found hi the mammalian armadillo homolog beta-catenin, the junctional plaque protein 
plakoglobin, the adenomatous polyposis coli (APC) tumor suppressor protein, and a number of other 

35 proteins (Peifer et ah, Cell 76(2):786-791 (1994)). 
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The 3 dimensional fold of an armadillo repeat is known from the crystal structure of beta- 
catenin (Rojas et al, Cell 95: 105-130 (1998)). There, the 12 repeats form a superhelix of alpha- 
helices, with three helices per unit. The cylindrical structure features a positively charged grove which 
presumably interacts with the acidic surfaces of the known interaction partners of beta-catenin. 
5 Cadherin domain (cadherin: Pfam Accession No. PF00028). SEQ ID NO: 1523 represents a 

polypeptide having sequence similarity to a cadherin domain. Cadherins are a family of animal 
glycoproteins responsible for calcium-dependent cell-cell adhesion (Takeichi, Annu. Rev. Biochem. 
59:237-252(1990); Takeichi, Trends Genet. 3:213-217(1987)). Cadherins preferentially interact with 
themselves in a homophilic manner in connecting cells; thus acting as both receptor and ligand. A 

10 wide number of tissue-specific forms of cadherins are known, for example: Epithelial (E-cadherin) 

(CDH1); Neural (N-cadherin) (CDH2); Placental (P-cadherin) (CDH3); Retinal (R-cadherin) (CDH4); 
Vascular endothelial (VE-cadherin) (CDH5); Kidney (K-cadherin) (CDH6); Cadherin-8 (CDH8); 
Cadherin-9 (CDH9); Osteoblast (OB-cadherin) (CDH1 1); Brain (BR-cadherin) (CDH12); T-cadherin 
(truncated cadherin) (CDH13); Muscle (M-cadherin) (CDH15); Kidney (Ksp-cadherin) (CDH16); and 

15 Liver-intestine (Ll-cadherin) (CDH17). 

Structurally, cadherins are built of the following domains: a signal sequence, followed by a 
propeptide of about 130 residues, then an extracellular domain of around 600 residues, then a 
transmembrane region, and finally a C-terminal cytoplasmic domain of about 150 residues. The 
extracellular domain can be sub-divided into five parts: there are four repeats of about 110 residues 

20 followed by a region that contains four conserved cysteines. The calcium-binding region of cadherins 
may be located in the extracellular repeats. The signature pattern for the repeated domain is located in 
the C-terminal extremity, which is its best conserved region. The pattern includes two conserved 
aspartic acid residues and two asparagjnes; these residues could be implicated in the binding of 
calcium. Hie consensus pattern is: [LIV]-x-|TIV]-x-D-x-N-D-|>fH]-x-P. 

25 CBS domain (CBS: Pfam Accession No. PF0057R SEQ ID NOS: 15 10 and 15 1 1 represent 

polypeptides having sequence similarity to CBS domains, which are present in all 3 forms of cellular 
life, including two copies in inosine monophosphate dehydrogenase, of which one is disordered in the 
crystal structure. A number of disease states are associated with CBS-containing proteins including 
homocystinuria, Becker's and Thomsen disease. 

30 CBS domains are small intracellular modules of unknown function. They are mostly found in 

2 or four copies within a protein. Pairs of CBS domains dimerise to form a stable globular domain 
(Zhang et al., Biochemistry 38:4691-4700 (1999)). Two CBS domains are found in inosine- 
monophosphate dehydrogenase from all species, however the CBS domains are not needed for 
activity. CBS domains are found attached to a wide range of other protein domains suggesting that 

35 CBS domains may play a regulatory role. The region containing the CBS domains hi Cystathionine- 
beta synthase is involved in regulation by S-AdoMet (Zhang et al., Biochemistry 38:4691-4700 
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(1999)). The 3D Structure is found as a sub-domain in TIM barrel of inosine-monophosphate 
dehydrogenase. 

Phorbol esters/diacylglycerol binding domain (CI domain) (DAG_PE-bind; Pfam Accessin 
No. PF00130). SEQ ID NO : 1 5 1 4 represents a polypeptide having sequence similarity to the Phorbol 
5 esters/diacylglycerol binding domain (CI domain). Diacylglycerol (DAG) is an important second 
messenger. Phorbol esters (PE) are analogues of DAG and potent tumor promoters that cause a 
variety of physiological changes when administered to both cells and tissues. DAG activates a family 
of serine/threonine protein kinases, collectively known as protein kinase C (PKC) (Azzi et al, Eur. J. 
Biochem. 208:547-557 (1992)). Phorbol esters can also directly stimulate PKC. 

1 0 The N-tenninal region of PKC, known as C 1 , has been shown to bind PE and DAG in a 

phospholipid and zinc-dependent fashion(Ono et al, Proc. Natl. Acad. Sci. U.S.A. 86:4868-4871 
(1989)). The CI region contains one or two copies (depending on the isozyme of PKC) of a cysteine- 
rich domain about 50 amino-acid residues long and essential for DAG/PE-binding. The DAG/PE- 
binding domain binds two zinc ions; the ligands of these metal ions are probably the six cysteines and 

1 5 two liistidines that are conserved in the C 1 domain. The consensus sequence for the C 1 domain is: H- 
x-[LIVMFYW]-x(8,n)-C-x(2)-C-x^ 
the C and H are involved in binding Zinc]. 

G ATA zinc finger (GATA; Pfam Accession No. PF00320). SEQ ID NO: 1520 represents a 
polypeptide having sequence similarity to GATA zinc finger. A number of transcription factors, 

20 including erythroid-specific transcription factor and nitrogen regulatory proteins, specifically bind the 
DNA sequence (A/T)GATA(A/G) in the regulatory regions of genes (Y amamoto et al., Genes Dev. 
4:1650-1662 (1990)) and are consequently termed GATA-binding transcription factors. The 
interactions occur via highly-conserved zinc finger domains in which the zinc ion is coordinated by 4 
cysteine residues (Evans and Felsenfeld, Cell 58:877-885 (1989); Omichinski et al., Science 261:438- 

25 446 (1993)). 

NMR studies have shown the core of the zinc finger to comprise 2 irregular anti-parallel beta- 
sheets and an alpha-helix, followed by a long loop to the C-terminal end of the finger. The N-terminal 
part, which includes the helix, is similar in structure, but not sequence, to the N-terminal zinc module 
of the glucocorticoid receptor DNA-binding domain. The helix and the loop connecting the 2 beta- 

30 sheets interact with the major groove of the DNA, while the C-terminal tail wraps around into the 

minor groove. It is this tail that is the essential determinant of specific binding. Interactions between 
the zinc finger and DNA are mainly hydrophobic, explaining the preponderance of thymines in the 
binding site; a large number of interactions with the phosphate backbone have also been observed 
(Omichinski et al., Science 261:438-446 (1993)). Two GATA zinc fingers are found in the GATA 

35 transcription factors; however, there are several proteins which only contains a single copy of the 
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domain. The consensus sequence of the domain is: C-x-[DN]-C-x(4,5)-[ST]-x(2)-W-[HR]-[RK]-x(3)- 
[GN]-x(3,4)- C-N-[AS]-C [The four C's are zinc ligands]. 

Glutathione S-transferase, N-terminal domain (GST N: Pfam Accession No. PF02798\ SEQ 
ID NO: 1507 represents a polypeptide having sequence similarity to Glutathione S-transferase, N- 
5 terminal domain. In eukaryotes, glutathione S-transferases (GSTs) participate in the detoxification of 
reactive electrophilic compounds by catalysing their conjugation to glutathione. The GST domain is 
also found in S-crystallins from squid, and proteins with no known GST activity, such as eukaryotic 
elongation factors 1 -gamma and the HSP26 family of stress-related proteins, which include auxin- 
regulated proteins in plants and stringent starvation proteins in E. coli. The major lens polypeptide of 

1 0 Cephalopoda is also a GST. 

Bacterial GSTs of known function often have a specific, growth-supporting role in 
biodegradative metabolism: epoxide ring opening and tetrachlorohydroquinone reductive 
dehalogenation are two examples of the reactions catalysed by these bacterial GSTs. Some regulatory 
proteins, like the stringent starvation proteins, also belong to the GST family. GST seems to be absent 

1 5 from Archaea in which gamma-glutamylcysteine substitute to glutathione as major thiol. 

Glutathione S-transferases form homodimers, but in eukaryotes can also form heterodimers of 
the Al and A2 or YC 1 and YC2 subunits. The homodimeric enzymes display a conserved structural 
fold. Each monomer is composed of a distinct N-terminal sub-domain, which adopts the thioredoxin 
fold, and a C-terminal all-helical sub-domain. 

20 GTF2I-like repeat (GTF2I: Pfam Accession No. PF02946V SEQ ID NOS: 1500, 1501, and 

1542 represent polypeptides having sequence similarity to proteins having GTF2I-like repeat. This 
region of sequence similarity is found up to six times in a variety of proteins including GTF2I. It has 
been suggested that this may be a DNA binding domain (O'Mahoney et al, Mol. Cell. Biol. 1 8:6641- 
6652 (1998); Osborne etal, Genomics 57:279-284 (1999)). 

25 Core histone H2A/H2B/H3/H4 (histone: Pfam Accession No. PF00125V SEQ ID NO: 1497 

represents a polypeptide having sequence similarity to core histone H2A/H2B/H3/H4 family 
polypeptides. Histone H2A is one of the four histones, along with FI2B, H3 and H4, which forms the 
eukaryotic nucleosome core. Using alignments of histone H2A sequences (Wells and Brown, Nucleic 
AcidsRes. 19:2173-2188(1991); Thatcher and Gorovsky, Nucleic Acids Res. 22:174-179(1994)) a 

30 conserved region in the N-terminal part of H2A was used to develop a signature pattern. This region 
is conserved both in classical S-phase regulated H2A's and in variant histone H2A's which are 
synthesized throughout the cell cycle. The consensus pattern is: [AC]-G-L-x-F-P-V. 

Histone H4, along with H3, plays a central role in nucleosome formation. The sequence of 
histone H4 has remained almost invariant in more then 2 billion years of evolution (Thatcher and 

35 Gorovsky, Nucleic Acids Res. 22:174-179(1994)). The region used as a signature pattern is a 

pentapeptide found in positions 14 to 18 of all H4 sequences. It contains a lysine residue which is 
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often acetylated (Doenecke and Gallwitz, Mol. Cell. Biochem. 44:113-128(1982)) and a histidine 
residue which is implicated in DNA-binding (Ebralidse etal., Nature 331:365-367(1988)). Hie 
consensus pattern is: G-A-K-R-H. 

Histone H3 is a higlily conserved protein of 135 amino acid residues (Wells and Brown, 
5 Nucleic Acids Res. 19:2173-2188(1991); Thatcher and Gorovsky, Nucleic Acids Res. 22:174- 
179(1994)). Two signature patterns have been developed, the first one corresponds to a perfectly 
conserved heptapeptide in the N-terminal part of H3, while the second one is derived from a 
conserved region in the central section of H3. The consensus patterns are: K-A-P-R-K-Q-L and P-F- 
x-[RA]-L-[VA]-[KRQ]-[pEG]-[rV]. 

1 0 The signature pattern of histone H2B corresponds to a conserved region in the C-terminal 

part of the protein. Hie consensus pattern is: |^]-E-|LIVM]-[EQ]-T-x(2)-|^]-x-[LIVM](2)-x- 
[PAG]-[pE]-L-x-|^]-H-A-|LrVM]-[STA]-E-G 

HMG flush mobility groupl box (T-TMG box: Pfam Accession No. PF00505-). SEQ ID 
NO: 1 525 corresponds to a polypeptide having sequence similarity to high mobility group proteins, a 

1 5 family of relatively low molecular weight non-histone components in chromatin. HMG 1 (also called 
HMG-T in fish) and HMG2 (Bustin et al, Biochim. Biophys. Acta 1 049: 23 1-243(1990)) are two 
highly related proteins that bind single-stranded DNA preferentially and unwind double-stranded 
DNA. HMG1/2 have about 200 amino acid residues with a highly acidic C-terminal section which is 
composed of an uninterrupted stretch of from 20 to 30 aspartic and glutamic acid residues; the rest of 

20 the protein sequence is very basic. In addition to the HMG1 and HMG2 proteins, HMG-domains 
occur in single or multiple copies in the following protein classes; the SOX family of transcription 
factors; SRY sex determining region Y protein and related proteins; LEF1 lymphoid enhancer binding 
factor 1; SSRP recombination signal recognition protein; MTF1 mitochondrial transcription factor 1; 
UBF1/2 nucleolar transcription factors; Abf2 yeast ARS-binding factor; and yeast transcription factors 

25 Ixrl,Roxl,Nlip6a,Nlip6bandSpp41. 

Importin beta binding domain (IBB: Pfam Accession No. PF01749). SEQ ID NO: 1486 
represents a polypeptide having sequence similarity to importin beta binding domain family 
polypeptides. This family consists of the importin alpha (karyopherin alpha), importin beta 
(karyopherin beta) binding domain. The domain mediates formation of the importin alpha beta 

30 complex; required for classical NLS import of proteins into the nucleus, through the nuclear pore 

complex and across the nuclear envelope. Also in the alignment is the NLS of importin alpha which 
overlaps with the IBB domain (Moroianu et al, Proc. Natl. Acad. Sci. U.S.A 93:6572-6576(1996)). 

T-box domain (T-box: Pfam Accession No. PF00907V SEQ IDNOS:1518 represents a 
polypeptide having sequence similarity to proteins having a T-box domain. The T-box gene family is 

35 an ancient group of putative transcription factors that appear to play a critical role in the development 
of all animal species. These genes were uncovered on the basis of similarity to the DNA binding 
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domain (Papaioannou and Silver, Bioessays 20:9-19 (1998)) of murine Brachyury (T) gene product, 
which similarity is the defining feature of the family. The Brachyury gene is named for its phenotype, 
which was identified 70 years ago as a mutant mouse strain with a short blunted tail. The gene, and its 
paralogues, have become a well-studied model for the family, and hence much of what is known about 
5 the T-box family is derived from the murine Brachyury gene. 

Consistent with its nuclear location, Brachyury protein has a sequence-specific DNA-binding 
activity and can act as a transcriptional regulator (Wattler et al, Genomics 48:24-33(1998)). 
Homozygous mutants for the gene undergo extensive developmental anomalies, thus rendering the 
mutation lethal (Kavka and Green, Biochim. Biophys. Acta 1333(2) (1997)). The postulated role of 
1 0 Brachyury is as a transcription factor, regulating the specification and differentiation of posterior 

mesoderm during gastrulation in a dose-dependent manner (Papaioannou and Silver, Bioessays 20:9- 
19 (1998)). 

Common features shared by T-box family members are, DNA-binding and transcriptional 
regulatory activity, a role in development and conserved expression patterns. Most of the known 

15 genes in all species are expressed in mesoderm or mesoderm precursors (Papaioannou, Trends Genet. 
13 :212-213(1997)). Members of the T-box family contain a domain of about 170 to 190 amino acids 
known as the T-box domain (Papaioannou, Trends Genet. 13: 212-213(1997); Bollag et al, Nat. 
Genet. 7: 383-389(1994); Agulnik et al., Genetics 144:249-254(1996)) and which probably binds 
DNA. As signature patterns for the T-domain, we selected two conserved regions. The first region 

20 corresponds to the N-terminal of the domain and the second one tothe central part. The consensus 
sequences are: L-W-x(2)-[FC]-x(3,4)-[NT]-E-M-[LrV](2)-T-x(2)-G-[RG]-[KRQ] and [LIVMFYW]- 
. H-[PADH]-[DENQ]-[GS]-x(3)-G-x(2)-W-M-x(3)-[TVA]-x-F. 

60s Acidic ribosomal protein (60s ribosomal: Pfam Accession No. PF00428V SEQIDNO: 
905 represents a polynucleotide encoding a member of the 60s acidic ribosomal protein family. The 

25 60S acidic ribosomal protein plays an important role in the elongation step of protein synthesis. This 
family includes archaebacterial L12, eukaryotic P0, PI and P2 (Remacha et al, Biochem. Cell Biol. 
73:959-968(1995)). 

Some of the proteins in this family are allergens. A nomenclature system has been established 
for antigens (allergens) that cause IgE-mediated atopic allergies in humans (WHO/IUIS Allergen 

30 Nomenclature Subcommittee King T.P., Hoffmann D., Loewenstein H. s Marsh D.G., Platts-Mills 
T.A.E., Thomas W. Bull. World Health Organ. 72:797-806(1994)). This nomenclature system is 
defined by a designation that is composed of the first three letters of the genus; a space; the first letter 
of the species name; a space and an arabic number. In the event that two species names have identical 
designations, they are discriminated from one another by adding one or more letters (as necessary) to 

3 5 each species designation. The allergens in this family include allergens with the following 
designations: Alt a 6, Alt a 12, Cla h 3, Clah 4, and Cla h 12. 
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AP endonuclease family 1 (AP endonucleasl: Pfam Accession No. PF01260\ SEQID 
NOS:358 and 836 correspond to a polynucleotide encoding a member of the family of polypeptides 
designated AP endonuclease family 1 . DNA damaging agents such as the antitumor drugs bleomycin 
and neocarzinostatin or those that generate oxygen radicals produce a variety of lesions in DNA. 
5 Amongst these is base-loss which forms apurinic/apyrimidinic (AP) sites or strand breaks with 

atypical 3 '-termini. DNA repair at the AP sites is initiated by specific endonuclease cleavage of the 
phosphodiester backbone. Such endonucleases are also generally capable of removing blocking 
groups from the 3'-terminus of DNA strand breaks. 

AP endonucleases can be classified into two families, on the basis of sequence similarity. This 
10 family contains members of AP endonuclease family 1 . Except for Rrpl and arp, these enzymes are 
proteins of about 300 amino-acid residues. Rrpl and arp both contain additional and unrelated 
sequences in their N-terminal section (about 400 residues for Rrpl and 270 for arp). The proteins 
contain glutamate which has been shown (Mol e? al, Nature 374: 381-386(1995)), in the Escherichia 
coli en2yme to bind a divalent metal ion such as magnesium or manganese. The consensus sequences 
15 for this family of polypeptides are: [APF]-D-[LIVMF](2)-x-|TIVM]-Q-E-x-K [E binds a divalent 
metal ion]; D-[ST]-[FY]-R-[KH]-x(7,8)-[FYW]-[ST]-[FYW](2); and N-x-G-x-R-[LIVM]-D- 
[LIVMFYFfJ-x-[LV]-x-S 

Bowman-Birk serine protease inhibitor family (Bowman-Birk l ee: Pfam Accession No. 
00228). SEQ ID NO: 321 represents a polynucleotide encoding a polypeptide having sequence 
20 similarity to a member of the Bowman-Birk serine protease inhibitor family. The Bowman-Birk 
inhibitor family (Laskowski and Kato, Atmu. Rev. Biochem. 49:593-626(1980)) is one of the 
numerous families of serine proteinase inhibitors and has a duplicated structure and generally 
possesses two distinct inliibitory sites. 

These inhibitors are found in the seeds of all leguminous plants as well as in cereal grains, hi 
25 cereals they exist in two forms, one of which is a duplication of the basic structure (Tashiro et al, J. 
Biochem. 102:297-306(1987)). The signature pattern for sequences belonging to this family of 
inhibitors is in the central part of the domain and includes four cysteines. The consensus pattern is: C- 
x(5,6)-[DENQKRHSTA]-C-[PASTDH]-[PASTDK]-[ASTDV]-C-[NDEKS]-[DEKRHSTA]-C[n 
four C's are involved in disulfide bonds]. Note that this pattern can be found twice in some duplicated 
30 cereal inhibitors. 

Cation efflux family CCation efflux: Pfam Accession No. PF01545). SEQ ID NO: 321 
encodes a polypeptide having sequence similarity to members of the cation efflux family of proteins. 
Members of this family are integral membrane proteins, that are found to increase tolerance to divalent 
metal ions such as cadmium, zinc, and cobalt. These proteins are thought to be efflux pumps that 
3 5 remove these ions from cells (Xiong and Jayaswal, J. Bacteriol. 180: 4024-4029(1 998); Kunito et al, 
Biosci. Biotechnol. Biochem. 60: 699-704(1996)). 
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DC1 domain (PCI: Pfam Accession No. PF031071. SEQ ID NO: 89 corresponds to a 
polypeptide having sequence similarity to a DC 1 domain. This short domain is rich in cysteines and 
histidines. The pattern of conservation is similar to that found in DAG_PE-bind (Pfam Accession No. 
PF00130), therefore this domain has been termed DC1 for divergent CI domain. Like the DAG_PE- 
5 bind domain, this domain probably also binds to two zinc ions. The function of proteins with this 
domain is uncertain, however this domain may bind to molecules such as diacylglycerol. This family 
are found hi plant proteins. 

Pneumovirus attachment glycoprotein G (Glycoprotein G: Pfam Accession No. PF00802). 
SEQ ID NO:995 represents a polypeptide having sequence similarity to members of the Pneumovirus 

1 0 attachment glycoprotein G protein family. This family includes attachment proteins from respiratory 
synctial virus. Glycoprotein G has not been shown to have any neuraminidase or hemagglutinin 
activity. The amino terminus is thought to be cytoplasmic, and the carboxyl terminus extracellular. 
The extracellular region contains four completely conserved cysteine residues. 

NADH-Ubiquinone/plastoquinone (complex D. various chains (oxidored ql: Pfam Accession 

15 No. PF00361). SEQ ID NO:413 represents a polypeptide having sequence similarity to NADH- 
Ubiquinone/plastoquinone (complex I), various chains protein family. This family is part of the 
NADH:ubiquinone oxidoreductase (complex D which catalyses the transfer of two electrons from 
NADH to ubiquinone in a reaction that is associated with proton translocation across the membrane 
(Walker, Q. Rev. Biophys. 25: 253-324(1992)). Sub-families within this protein family include 

20 NADH-ubiquinone oxidoreductase chain 5 ; NADH-ubiquinone oxidoreductase chain 2 ; NADH- 
ubiquinone oxidoreductase chain 4; and Multicomponent K+:H+antiporter. 

Protamine PI (protamine PI : Pfam Accession No. PF00260). SEQ IDNOS:645 and 1217 
represent polypeptides having sequence similarity to Protamine PI protein family. Protamines are 
small, highly basic proteins, that substitute for histones in sperm chromatin during the haploid phase 

25 of spermatogenesis. They pack sperm DNA into a highly condensed, stable and inactive complex. 
There are two different types of mammalian protamine, called PI and P2. PI has been found in all 
species studied, while P2 is sometimes absent. There also seems to be a single type of avian 
protamine whose sequence is closely related to that of mammalian PI (Oliva et al, J. Biol. Chem. 
264:17627-17630(1989)). A conserved region at the N-terminal extremity of the sequence is used as 

30 a signature pattern for this family of proteins. The consensus pattern is: [AV]-R-[NFY]-R-x(2,3> 
[ST]-x-S-x-S. 

Squash family serine protease inhibitor (squash: Pfam Accession No. PF00299). SEQ ID 
NO:995 represents a polypeptide having sequence similarity to Squash family serine protease inhibitor 
proteins. Hie squash inhibitors form one of a number of serine protease inhibitor families. The 
3 5 proteins, found in the seeds of cucurbitaceae plants (squash, cucumber, balsam pear, etc.), are 

approximately 30 residues in length, and contain 6 Cys residues, which form 3 disulfide bonds (Bode 
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et al, FEBSLett. 242: 285-292(1989)). The inhibitors function by being taken up by a serine 
protease (such as trypsin), which cleaves the peptide bond between Arg/Lys and He residues in the N- 
terminal portion of the protein (Bode etal, FEBS Lett. 242: 285-292(1989); Krishnamoorthi et al., 
Biochemistry 3 1 : 898-904(1992)). Structural studies have shown that the inhibitor has an ellipsoidal 
5 shape, and is largely composed of beta-turns (Bode et al, FEBSLett. 242: 285-292(1989)). The fold 
and Cys connectivity of the proteins resembles that of potato carboxypeptidase A inhibitor 
(Krislinamoorthi et al, Biochemistry 31: 898-904(1992)). The pattern used to detect this family of 
proteins spans the major part of the sequence and includes five of the six cysteines involved in 
disulfide bonds. The consensus pattern is: C-P-x(5)-C-x(2)-[DN]-x-D-C-x(3)-C-x-C [The five C's are 
1 0 involved in disulfide bonds] 

Metallothionein family 5 (Metallothio 5: Pfam Accession No. PF02067). SEQ ID NO:995 
represents a polypeptide having sequence similarity to metallothionein family 5 proteins. 
Metallotliioneins (MT) are small proteins that bind heavy metals, such as zinc, copper, cadmium, and 
nickel. They have a high content of cysteine residues that bind the metal ions through clusters of 
15 thiolate bonds (Kagi, Meth. Enzymol. 205: 613-626(1991); Kagi and Kojima, Experientia Suppl. 52: 
25-61(1987); Kagi and Schaffer, Biochemistry 27: 8509-8515(1988)). 

Due to limitations in the original classification system of MTs, which did not allow clear 
differentiation of patterns of structural similarities, either between or within classes, all class I and 
class II MTs (the proteinaceous sequences) have now been grouped into families of phylogenetically- 
20 related and thus alignable sequences. Diptera (Drosopbila, family 5) MTs are 40-43 residue proteins 
that contain 10 conserved cysteines arranged in five Cys-X-Cys groups. In particular, the consensus 
pattern C-G-x(2)-C-x-C-x(2)-Q-x(5>C-x-C-x(2)-D-C-x-C has been found to be diagnostic of family 5 
MTs. The protein is found primarily in the alimentary canal, and its induction is stimulated by 
ingestion of cadmium or copper (Lastowski etal., J. Biol. Chem. 260: 1527-1530(1985)). Mercury, 
25 silver and zinc induce the protein to a lesser extent. 

Caenorhabditis. elegans Sre G protein-coupled chemoreceptor (Sre: Pfam Accession No. 
PF03125). SEQ ID NO:591 represents a polypeptide having sequence similarity to C. elegans Sre G 
protein-coupled chemoreceptor family proteins. C. elegans Sre proteins are candidate chemosensory 
receptors. There are four main recognized groups of such receptors: Odr-10, Sra, Sro, and Srg. Sre 
30 (this family), Sra Sra and Srb Srb comprise the Sra group. All of the above receptors are thought to be 
G protein-coupled seven transmembrane domain proteins (Troemel, Bioessays 21:1011-1020 (1999); 
Troemel et al, Cell 83:207-218 (1995)). 

Svndecan domain fSvndecan: Pfam Accession No. PF01034X SEQ ID NO:995 corresponds 
to a polypeptide having a syndecan domain. Syndecans (Bernfield et al, Annu. Rev. Cell Biol. 8:3 65- 
35 393(1992); David, FASEB J. 7:1023-1030(1993)) are a family of transmembrane heparan sulfate 
proteoglycans which are implicated in the binding of extracellular matrix components and growth 



74 



WO 2004/039943 



PCT/LS2003/015465 



factors. Syndecans bind a variety of molecules via their heparan sulfate chains and can act as 
receptors or as co-receptors. Structurally, these proteins consist of four separate domains: a) a signal 
sequence; b) an extracellular domain (ectodomain) of variable length containing the sites of 
attachment of the heparan sulfate glycosaminoglycan side chains and whose sequence is not 
5 evolutionary conserved in the various forms of syndecans; c) a transmembrane region; and d) a 
highly conserved cytoplasmic domain of about 30 to 35 residues which could interact with 
cytoskeletal proteins. 

The signature pattern for syndecans starts with the last residue of the transmembrane region 
and includes the first 1 0 residues of the cytoplasmic domain. This region, which contains four basic 
10 residues, may act as a stop transfer site. The consensus pattern is: [FY]-R-[IM]-[KR]-K(2)-D-E-G-S- 
Y. 

LI transposable element (Transnosase 22: Pfam Accession No.PF02994V SEQ ID NO:774 
represents a polypeptide having an LI transposable element. Many human LI elements are capable of 
retrotransposition and some of these have been shown to exhibit reverse transcriptase (RT) activity 

15 (Sassaman et al, Nat Genet 16(1):37-43(1997)) although the function of many are, as yet, unknown. 
There are estimated to be 30-60 active LI elements reside in the average diploid genome. 

WW domain (WW: Pfam Accession No. PF00397). SEQ ID NO:43 1 represents a 
polypeptide having WW domain. The WW domain (also known as rsp5 or WWP) is a short 
conserved region in a number of unrelated proteins, among them dystrophin, responsible for 

20 Duchenne muscular dystrophy. This short domain may be repeated up to four times in some proteins 
(Bork and Sudol, Trends Biochem. Sci. 19: 531-533(1994); Andre and Springael, Biochem. Biophys. 
Res. Commun. 205: 1201-1205(1994); Hofmann and Bucher, FEBS Lett. 358: 153-157(1995); Sudol 
et al., FEBS Lett. 369: 67-71(1995)). The WW domain binds to proteins with particular proline- 
motifs, [AP]-P-P-[AP]-Y, and having four conserved aromatic positions that are generally Trp (Chen 

25 and Sudol, Proc. Natl. Acad. Sci. U.S.A. 92: 7819-7823(1995)). The name WW or WWP derives 
from the presence of these Trp as well as that of a conserved Pro. The WW domain is frequently 
associated with other domains typical for proteins in signal transduction processes. 

A large variety of proteins containing the WW domain are known. These include; dystrophin, 
a multidornain cytoskeletal protein; utrophin, a dystrophin-like protein of unknown function; 

3 0 vertebrate YAP protein, substrate of an unknown serine kinase; mouse NEDD-4, involved in the 
embryonic development and differentiation of the central nervous system; yeast RSP5, similar to 
NEDD-4 in its molecular organization; rat FE65, a transcription-factor activator expressed 
preferentially in liver; tobacco DB10 protein and others. The consensus pattern is: W-x(9,l 1)-[VFY]- 
[FYW]-x(6,7)-[GSTNE]-[GSTQCR]-[FYW]-x(2>P. 

35 Example 6: Detection of Differential Expression Using Arrays and source of patient tissue 

samples 
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mRNA isolated from samples of cancerous and normal breast and colon tissue obtained from 
patients were analyzed to identify genes differentially expressed in cancerous and normal cells. 
Normal and cancerous tissues were collected from patients using laser capture microdissection (LCM) 
techniques, which techniques are well known in the art (see, e.g., Ohyama et al. (2000) Biotechniques 
5 29:530-6; Cumin et al. (2000) Mai Pathol. 53:64-8; Suarez-Quian et al. (1999) Biotechniques 
26:328-35; Simone et al. (1998) Trends Genet 14:272-6; Conia et al. (1997) J. Clin. Lab. Anal. 
11:28-38; Emmert-Buek et al. (1996) Science 274:998-1001). 

Table 1 1 (inserted prior to claims) provides information about each patient from which colon 
tissue samples were isolated, including: the Patient ID ("PT ID") and Path ReporttD ("Path ID"), 

1 0 which are numbers assigned to the patient and the pathology reports for identification purposes; the 
group ("Grp") to which the patients have been assigned; the anatomical location of the tumor 
("Anatorn Loc"); the primary tumor size ("Size"); the primary tumor grade ("Grade"); the 
identification of the histopathological grade ("Histo Grade"); a description of local sites to which.the 
tumor had invaded ("Local Invasion"); the presence of lymph node metastases ("Lymph Met"); the 

1 5 incidence of lymph node metastases (provided as a number of lymph nodes positive for metastasis 
over the number of lymph nodes examined) ("Lymph Met Incid"); the regional lymphnode grade 
("Reg Lymph Grade"); the identification or detection of metastases to sites distant to the tumor and 
their location ("Dist Met & Loc"); the grade of distant metastasis ("Dist Met Grade"); and general 
comments about the patient or the tumor ("Comments"). Histophatology of all primary tumors 

20 indicated the tumor was adenocarcinmoa except for Patient ID Nos. 1 3 0 (for which no information 
was provided), 392 (in which greater than 50% of the cells were mucinous carcinoma), and 784 
(adenosquamous carcinoma). Extranodal extensions were described in three patients, Patient ID Nos. 
784, 789, and 791. Lymphovascular invasion was described in Patient ID Nos. 128, 278, 517, 534, 
784, 786, 789, 791, 890, and 892. Crohn's-like infiltrates were described in seven patients, Patient ID 

25 Nos. 52, 264, 268, 392, 393, 784, and 791. 

Table 12 (below) provides information about each patient from which the breast tissue 
samples were isolated, including: 1) the "Pat Num", a number assigned to the patient for identification 
purposes; 2) the "Histology", which indicates whether the tumor was characterized as an intraductal 
carcinoma (IDC) or ductal carcinoma in situ (DCIS); 3) the incidence of lymph node metastases 

30 (LMF), represented as the number of lymph nodes positive to metastases out of the total number 

examined in the patient; 4) the "Tumor Size"; 5) "TNM Stage", which provides the tumor grade (T#), 
where the number indicates the grade and "p" indicates that the tumor grade is a pathological 
classification; regional lymph node metastasis (N#), where "0" indicates no lymph node metastases 
were found, "1" indicates lymph node metastases were found, and "X" means information not 

3 5 available and; the identification or detection of metastases to sites distant to the tumor and their 
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location (M#), with "X" indicating that no distant mesatses were reported; and the stage of the tumor 
("Stage Grouping"), "nr" indicates "no reported". 



Table 12. Breast cancer patient data. 



Pat 
Num 


Histology 


LMF 


Tumor 
Size 


TNM Stage 


Stage Grouping 


280 


IDC, DCIS+D2 


nr 


2 cm 


T2NXMX 


probable Stage II 


284 


IDC, DCIS 


0/16 


2 cm 


T2pN0MX 


Stage H 


285 


IDC, DCIS 


nr 


4.5 cm 


T2NXMX 


probable Stage II 


291 


IDC, DCIS 


0/24 


4.5 cm 


T2pN0MX 


Stage H 


302 


IDC, DCIS 


nr 


2.2 cm 


T2NXMX 


probable Stage II 


375 


[DC, DCIS 


nr 


1.5 cm 


riNXMX 


probable Stage I 


408 


[DC 


0/23 


3.0 cm 


HpNOMX 


Stage H 


416 


[DC 


0/6 


3.3 cm 


T2pN0MX 


Stage II 


421 


[DC, DCIS 




3.5 cm 


I7NXMX 


probable Stage II 


459 


[DC 


2/5 


4.9 cm 


HpNlMX 


Stage n 


465 


[DC 


0/10 


6.5 cm 


DpNOMX 


Stage n 


470 


[DC, DCIS 


0/6 


2.5 cm 


DpNOMX 


Stage H 


472 


[DC, DCIS 


6/45 


5.0+ cm 


DpNIMX 


Stage m 


474 


[DC 


0/18 


6.0 cm 


DpNOMX 


Stage n 


476 


IDC 


0/16 


3 .4 cm 


r2 P N0MX 


Stage U 


605 


[DC, DCIS 


1/25 


5.0 cm 


DpNIMX 


Stage n 


649 


IDC, DCIS 


1/29 


4.5 cm 


T2pNlMX 


Stage II 



Identification of differentially expressed t*enes 



5 cDNA probes were prepared from total RNA isolated from the patient cells described above. 

Since LCM provides for the isolation of specific cell types to provide a substantially homogenous cell 
sample, this provided for a similarly pure RNA sample. 

Total RNA was first reverse transcribed into cDNA using a primer containing a T7 RNA 
polymerase promoter, followed by second strand DNA synthesis. cDNA was then transcribed in vitro 

10 to produce antisense RNA using the T7 promoter-mediated expression (see, e.g., Luo et al. (1999) 
Nature Med 5:1 17-122), and the antisense RNA was then converted into cDNA. The second set of 
cDNAs were again transcribed in vitro, using the T7 promoter, to provide antisense RNA. Optionally, 
the RNA was again converted into cDNA, allowing for up to a third round of T7-mediated 
amplification to produce more antisense RNA. Thus the procedure provided for two or three rounds 

15 of in viti-o transcription to produce the final RNA used for fluorescent labeling. 

Fluorescent probes were generated by first adding control RNA to the antisense RNA mix, 
and producing fluorescently labeled cDNA from the RNA starting material. Fluorescently labeled 
cDNAs prepared from the tumor RNA sample were compared to fluorescently labeled cDNAs 
prepared from normal cell RNA sample. For example, the cDNA probes from the normal cells were 
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labeled with Cy3 fluorescent dye (green) and the cDNA probes prepared from the tumor cells were 
labeled with Cy5 fluorescent dye (red), and vice versa. 

Each array used had an identical spatial layout and control spot set. Each microarray was 
divided into two areas, each area having an array with, on each half, twelve groupings of 32 x 12 

5 spots, for a total of about 9,2 1 6 spots on each array. The two areas are spotted identically which 
provide for at least two duplicates of each clone per array. 

Polynucleotides for use on the arrays were obtained from both publicly available sources and 
from cDNA libraries generated from selected cell lines and patient tissues. PCR products of from 
about 0.5kb to 2.0 kb amplified from these sources were spotted onto the array using a Molecular 

1 0 Dynamics Gen m spotter according to the manufacturer's recommendations. The first row of each of 
the 24 regions on the array had about 32 control spots, including 4 negative control spots and 8 test 
polynucleotides. The test polynucleotides were spiked into each sample before the labeling reaction 
with a range of concentrations from 2-600 pg/slide and ratios of 1:1. For each array design, two slides 
were hybridized with the test samples reverse-labeled in the labeling reaction. This provided for about 

15 four duplicate measurements for each clone, two of one color and two of the other, for each sample. 

The differential expression assay was performed by mixing equal amounts of probes from 
tumor cells and normal cells of the same patient ("matched") or from tumor cells and normal cells of 
different patients ("unmatched") {i.e., the tumor cells are from one patient and the normal cells are 
from a different patient). The arrays were prchybridized by incubation for about 2 hrs at 60°C in 5X 

20 SSC/0.2% SDS/1 mM EDTA, and then washed three times in water and twice in isopropanol. 

Following prehybridization of the arr ay, the probe mixture was then hybridized to the array under 
conditions of high stringency (overnight at 42°C in 50% formamide, 5X SSC, and 0.2% SDS. After 
hybridization, the array was washed at 55°C three tunes as follows: 1) first wash in IX SSC/0.2% 
SDS; 2) second wash in 0.1X SSC/0.2% SDS; and 3) third wash in 0.1X SSC. 

25 The arrays were then scanned for green and red fluorescence using a Molecular Dynamics 

Generation in dual color laser-scanner/detector. The images were processed using BioDiscovery 
Autogene software, and the data from each scan set normalized to provide for a ratio of expression 
relative to normal. Data from the microarray experiments was analyzed according to the algorithms 
described in U.S. application serial no. 60/252,358, filed November 20, 2000, by E.J. Moler, MA. 

30 Boyle, and F.M. Randazzo, and entitled "Precision and accuracy in cDNA microarray data," which 
application is specifically incorporated herein by reference. 

The experiment was repeated, this time labeling the two probes with the opposite color in 
order to perform the assay in both "color directions." Each experiment was sometimes repeated with 
two more slides (one in each color direction). The level fluorescence for each sequence on the array 

35 expressed as a ratio of the geometric mean of 8 replicate spots/genes from the four arrays or 4 replicate 
spots/gene from 2 arrays or some other permutation. The data were normalized using the spiked 
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positive controls present in each duplicated area, and the precision of this normalization was included 
in the final determination of the significance of each differential. The fluorescent intensity of each spot 
was also compared to the negative controls in each duplicated area to determine which spots have 
detected significant expression levels in each sample. 

A statistical analysis of the fluorescent intensities was applied to each set of duplicate spots to 
assess the precision and significance of each differential measurement, resulting in a p-value testing 
the null hypothesis that there is no differential in the expression level between the tumor and normal 
samples of each patient in matched samples or between tumor and normal samples of tissue from 
different patients in unmatched samples. During initial analysis of the microarrays, the hypothesis was 
accepted if p > 10" 3 , and the differential ratio was set to 1 .000 for those spots. All other spots have a 
significant difference in expression between the tumor and normal sample. If the tumor sample has 
detectable expression and the normal does not, the ratio is truncated at 1000 since the value for 
expression in the normal sample would be zero, and the ratio would not be a mathematically useful 
value (e.g., infinity). If the normal sample has detectable expression and the tumor does not, the ratio 
is truncated to 0.001, since the value for expression in the tumor sample would be zero and the ratio 
would not be a mathematically useful value. These latter two situations are referred to herein as 
"on/off." Database tables were populated using a 95% confidence level (p>0.05). 

Table 13 (inserted prior to claims) provides the results for gene products expressed by at least 
2-fold or greater in cancerous prostate, colon, or breast tissue samples relative to normal tissue 
samples in at least 20% of the patients tested. Table 13 includes: 1) the SEQ ID NO ("SEQ ID") 
assigned to each sequence for use in the present specification; 2) the sequence name ("SEQ NAME") 
used as an internal identifier of the sequence; 3) the name assigned to the clone from which the 
sequence was isolated ("CLONE ID"); 4) the percentage of patients tested in which expression levels 
(e.g., as message level) of the gene was at least 2-fold greater in cancerous breast tissue than in 
matched normal tissue ("BREAST PATIENTS >=2x"); 5) the breast number ratios, indicating the 
number of patients upon which the provided ratio using matched breast tissue was based ("BREAST 
NUM RATIOS"); 6) the percentage of patients tested in which expression levels (e.g., as message 
level) of the gene was at least 2-fold greater in cancerous colon tissue than in matched normal tissue 
("COLON PATIENTS >=2x"); 7) the colon number ratios, indicating the number of patients upon 
which the provided ratio using matched colon tissue was based ("COLON NUM RATIOS"); 8) the 
percentage of patients tested in which expression levels (e.g. , as message level) of the gene was at 
least 2-fold greater in cancerous colon tissue than in unmatched normal tissue ("COLON UM >=2x"); 
9) the unmatched colon number ratios, indicating the number of patients upon which the provided 
ratio using unmatched colon tissue was based ("COLON UM NUM RATIOS"). 

Table 16 (inserted prior to claims) provides the results for other gene products expressed by at 
least 2-fold or greater in cancerous prostate, colon, or breast tissue samples, which may be 
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metastasized cancer samples, relative to normal tissue samples in at least 20% of the patients tested. 
For each set of data (i.e., the percentage of patients in which a particular sequence is up-regulated in a 
cancer tissue) the number of patients (Colon Cancer Patients; Colon Unmatched Met Patients and 
Colon Match Met Patients) is shown. If a sample is matched, it is matched to a sample from the same 
5 patient, if a sample is unmatched, the results obtained from that sample are compared to a pooled 

sample of an appropriate tissue type from the patients. If a sample is not from a metastasized tissue, it 
is from a primary tumor. 

These data provide evidence that the genes represented by the polynucleotides having the 
indicated sequences are differentially expressed in breast, prostate, cancer as compared to normal non- 
10 cancerous breast tissue and are differentially expressed in colon cancer as compared to normal non- 
cancerous colon tissue 

The above methods can be performed to identify genes differentially expressed in cancerous 
and normal cells of any type of tissue, such as prostate, lung, colon, breast, and the like. 
Example 7: Antisense Regulation of Gene Expression 

1 5 The expression of the differentially expressed genes represented by the polynucleotides in the 

cancerous cells can be further analyzed using antisense knockout technology to confirm the role and 
function of the gene product in tumorigenesis, e.g., in promoting a metastatic phenotype. 

Methods for analysis using antisense technology are well known hi the art. For example, a 
number of different oligonucleotides complementary to the mRNA generated by the differentially 

20 expressed genes identified herein can be designed as antisense oligonucleotides, and tested for their 
ability to suppress expression of the genes. Sets of antisense oligomers specific to each candidate 
target are designed using the sequences of the polynucleotides corresponding to a differentially 
expressed gene and the software program HYBsimulator Version 4 (available for Windows 
95/Windows NT or for Power Macintosh, RNAture, Inc. 1 003 Health Sciences Road, West, Irvine, 

25 CA 926 1 2 USA). Factors considered when designing antisense oligonucleotides include: 1 ) the The 
expression of the differentially expressed genes represented by the polynucleotides in the cancerous 
cells can be analyzed using antisense knockout technology to confirm the role and function of the gene 
product in tumorigenesis, e.g., in promoting a metastatic phenotype. 

A number of different oligonucleotides complementary to the mRNA generated by the 

30 differentially expressed genes identified herein can be designed as potential antisense 

oligonucleotides, and tested for their ability to suppress expression of the genes. Sets of antisense 
oligomers specific to each candidate target are designed using the sequences of the polynucleotides 
corresponding to a differentially expressed gene and the software program HYBsimulator Version 4 
(available for Windows 95/Windows NT or for Power Macintosh, RNAture, Inc. 1003 Health 

3 5 Sciences Road, West, Irvine, CA 926 12 USA). Factors that are considered when designing antisense 
oligonucleotides include: 1) the secondary structure of oligonucleotides; 2) the secondary structure of 
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the target gene; 3) the specificity with no or minimum cross-hybridization to other expressed genes; 4) 
stability, 5) length and 6) terminal GC content. The antisense oligonucleotide is designed so that it 
will hybridize to its target sequence under conditions of high stringency at physiological temperatures 
(e.g., an optimal temperature for the cells in culture to provide for hybridization in the cell, e.g., about 

5 37°C), but with minimal formation of homodimers. 

Using the sets of oligomers and the HYBsimulator program, three to ten antisense 
oligonucleotides and their reverse controls are designed and synthesized for each candidate mRNA 
transcript, which transcript is obtained from the gene corresponding to the target polynucleotide 
sequence of interest. Once synthesized and quantitated, the oligomers are screened for efficiency of a 

1 0 transcript knock-out in a panel of cancer cell lines. The efficiency of the knock-out is determined by 
analyzing mRNA levels using lightcycler quantification. The oligomers that resulted in the highest 
level of transcript knock-out, wherein the level was at least about 50%, preferably about 80-90%, up 
to 95% or more up to undetectable message, are selected for use in a cell-based proliferation assay, an 
anchorage independent growth assay, and an apoptosis assay. 

1 5 The ability of each designed antisense oligonucleotide to inhibit gene expression is tested 

through transfection into LNCaP, PC3, 22Rvl, MDA-PCA-2b, or DU145 prostate carcinoma cells. 
For each transfection mixture, a carrier molecule (such as a lipid, lipid derivative, lipid-like molecule, 
cholesterol, cholesterol derivative, or cholesterol-like molecule) is prepared to a working concentration 
of 0.5 mM in water, sonicated to yield a uniform solution, and filtered through a 0.45 um PVDF 

20 membrane. The antisense or control oligonucleotide is then prepared to a working concentration of 
100 uM in sterile Millipore water. The oligonucleotide is further diluted in OptiMEM™ 
(Gibco/BRL), in a microfuge tube, to 2 uM, or approximately 20 ug oligo/ml of OptiMEM™. hi a 
separate microfuge tube, the carrier molecule, typically in the amount of about 1 .5-2 nmol carrier/ug 
antisense oligonucleotide, is diluted into the same volume of OptiMEM™ used to dilute the 

25 oligonucleotide. The diluted antisense oligonucleotide is immediately added to the diluted carrier and 
mixed by pipetting up and down. Oligonucleotide is added to the cells to a final concentration of 30 
nM. 

The level of target mRNA that corresponds to a target gene of interest in the transfected cells 
is quantitated in the cancer cell lines using the Roche LightCycler™ real-time PCR machine. Values 

30 for the target mRNA are normalized versus an internal control (e.g., beta-actin). For each 20 ul 

reaction, extracted RNA (generally 0.2-1 ug total) is placed into a sterile 0.5 or 1 .5 ml microcentrifuge 
tube, and water is added to a total volume of 12.5 ill. To each tube is added 7.5 ul of a buffer/enzyme 
mixture, prepared by mixing (in the order listed) 2.5 ul H 2 0, 2.0 ul 10X reaction buffer, 10 ul oligo 
dT (20 pmol), 1.0 ul dNTP mix (10 mM each), 0.5 ul RNAsin® (20u) (Ambion, Inc., Hialeah, FL), 

35 and 0.5 ul MMLV reverse transcriptase (50u) (Ambion, Inc.). The contents are mixed by pipetting up 
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and down, and the reaction mixture is incubated at 42°C for 1 hour. The contents of each tube are 

centrifuged prior to amplification. 

An amplification mixture is prepared by mixing in the following order: IX PCR buffer n, 3 

mM MgCl 2 , 140 uM each dNTP, 0.175 pmol each oligo, 1:50,000 dil of SYBR® Green, 0.25 mg/ml 
5 BSA, 1 unit Taq polymerase, and H 2 0 to 20 ul. (PCR buffer II is available in 10X concentration from 

Perkin-Elmer, Norwalk, CT). In IX concentration it contains 10 mM Tris pH 8.3 and 50 mM KC1. 

SYBR® Green (Molecular Probes, Eugene, OR) is a dye which fluoresces when bound to double 

stranded DNA. As double stranded PCR product is produced during amplification, the fluorescence 

from SYBR® Green increases. To each 20 ul aliquot of amplification mixture, 2 ul of template RT is 
10 added, and amplification is carried out according to standard protocols. The results are expressed as 

the percent decrease in expression of the corresponding gene product relative to non-transfected cells, 

vehicle-only transfected (mock-transfected) cells, or cells transfected with reverse control 

oligonucleotides. 

Example 8: Effect of Expression on Proliferation 
15 The effect of gene expression on the inhibition of cell proliferation can be assessed in 

metastatic breast cancer cell lines (MDA-MB-23 1 ("23 1 ")); SW620 colon colorectal carcinoma cells; 

SKOV3 cells (a human ovarian carcinoma cell line); or LNCaP, PC3, 22Rvl, MDA-PCA-2b, or 

DU145 prostate cancer cells. 

Cells are plated to approximately 60-80% confluency in 96-well dishes. Antisense or reverse 
20 control oligonucleotide is diluted to 2 uM in OptiMEM™. The oligonucleotide-OptiMEM™ can then 

be added to a delivery vehicle, which delivery vehicle can be selected so as to be optimized for the 

particular cell type to be used in the assay. The oligo/delivery vehicle mixture is then further diluted 

into medium with serum on the cells. The final concentration of oligonucleotide for all experiments 

can be about 300 nM. 

25 Antisense oligonucleotides are prepared as described above (see Example 3). Cells are 

transfected overnight at 37°C and the transfection mixture is replaced with fresh medium the next 
morning. Transfection is carried out as described above in Example 8. 

Those antisense oligonucleotides that result in inhibition of proliferation of SW620 cells 
indicate that the corresponding gene plays a role in production or maintenance of the cancerous 

30 phenotype in cancerous colon cells. Those antisense oligonucleotides that inhibit proliferation in 
SKOV3 cells represent genes that play a role in production or maintenance of the cancerous 
phenotype in cancerous breast cells. Those antisense oligonucleotides that result in inhibition of 
proliferation of MDA-MB-23 1 cells indicate that the corresponding gene plays a role in production or 
maintenance of the cancerous phenotype in cancerous ovarian cells. Those antisense oligonucleotides 

35 that inhibit proliferation in LNCaP, PC3, 22Rvl, MDA-PCA-2b, or DU145 cells represent genes that 
play a role in production or maintenance of the cancerous phenotype in cancerous prostate cells. 
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Using the following antisense oligonucleotides: TTGGTTCCCAAGACAAGCCGTGAC 
(SEQ ID NO:1543); TCTCAACGCTACCAGGCACTCCTTG (SEQ ID NO:1544); 
GCACAGCCCAAAGTCAAAGGCATTA (SEQ ID NO: 1545); 
CAGGCACTCCTTGGTCAAATGTGGG (SEQ ID NO:1546); 
5 GGACAGGGAAAGGAGAGGCTAGTCA (SEQ ID NO: 1547) and 

TGCATTCTCTCCCACATCTCAACGC SEQ ED NO: 1548, corresponding to a glutathione 
transferase omega identified by SEQ ID NOS: 1377 and 1541 (Ghiron Candidate Id 21), were used to 
inhibit proliferation of SW620 colon colorectal carcinoma cells. These antisense molecules reduced 
glutathione transferase omega RNA expression by approximately 90%. 

10 Example 9: Effect of Gene Expression on Cell Migration 

The effect of gene expression on the inhibition of cell migration can be assessed in LNCaP, 
PC3, 22Rvl, MDA-PCA-2b, or DU145 prostate cancer cells using static endothelial cell binding 
assays, non-static endothelial cell binding assays, and transmigration assays. 

For the static endothelial cell binding assay, antisense oligonucleotides are prepared as 

1 5 described above (see Example 8). Two days prior to use, prostate cancer cells (CaP) are plated and 
transfected with antisense oligonucleotide as described above (see Examples 3 and 4). On the day 
before use, the medium is replaced with fresh medium, and on the day of use, the medium is replaced 
with fresh medium containing 2 uM CellTracker green CMFDA (Molecular Probes, Inc.) and cells are 
incubated for 30 min. Following incubation, CaP medium is replaced with fresh medium (no 

20 CMFDA) and cells are incubated for an additional 3 0-60 min. CaP cells are detached using CMF 
PBS/2.5 mM EDTA or trypsin, spun and resuspended in DMEM/1% BSA7 10 mM HEPES pH 7.0. 
Finally, CaP cells are counted and resuspended at a concentration of lxl 0 6 cells/ml. 

Endothelial cells (EC) are plated onto 96-well plates at 40-50% confluence 3 days prior to 
use. On the day of use, EC are washed IX with PBS and 501 DMDM/1 %BSA/1 OmM HEPES pH 7 

25 is added to each well. To each well is then added 50K (50?i) CaP cells in DMEM/1% BSA/ lOmM 
HEPES pH 7. The plates are incubated for an additional 30 min and washed 5X with PBS containing 
Ca^ and Mg^. After the final wash, 100 uL PBS is added to each well and fluorescence is read on a 
fluorescent plate reader (Ab492/Em 516 nm). 

For the non-static endothelial cell binding assay, CaP are prepared as described above. EC 

30 are plated onto 24-well plates at 30-40% confluence 3 days prior to use. On the day of use, a subset of 
EC are treated with cytokine for 6 hours then washed 2X with PBS. To each well is then added 150- 
200K CaP cells in DMEM/1% BSA/ lOmM HEPES pH 7. Plates are placed on a rotating shaker (70 
RPM) for 30 min and then washed 3X with PBS containing Ca*" 1 " and Mg**. After the final wash, 500 
uL PBS is added to each well and fluorescence is read on a fluorescent plate reader (Ab492/Em 516 

35 nm). 
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For the transmigration assay, CaP are prepared as described above with the following 
changes. On the day of use, CaP medium is replaced with fresh medium containing 5 uM CellTracker 
green CMFDA (Molecular Probes, Inc.) and cells are incubated for 30 min. Following incubation, 
CaP medium is replaced with fresh medium (no CMFDA) and cells are incubated for an additional 
5 3 0-60 min. CaP cells are detached using CMF PBS/2.5 mM EDTA or trypsin, spun and resuspended 
in EGM-2-MV medium. Finally, CaP cells are counted and resuspended at a concentration of lxlO 6 
cells/ml. 

EC are plated onto FluorBlok transwells (BD Biosciences) at 30-40% confluence 5-7 days 
before use. Medium is replaced with fresh medium 3 days before use and on the day of use. To each 

10 transwell is then added 50K labeled CaP. 30 min prior to the first fluorescence reading, 10 ug of 
FITC-dextran (1 OK MW) is added to the EC plated filter. Fluorescence is then read at multiple time 
points on a fluorescent plate reader (Ab492/Em 516 nm). 

Those antisense oligonucleotides that result in inhibition of binding of LNCaP, PC3, 22Rvl, 
MDA-PCA-2b, or DU145 prostate cancer cells to endothelial cells indicate that the corresponding 

1 5 gene plays a role in the production or maintenance of the cancerous phenotype in cancerous prostate 
cells. Those antisense oligonucleotides that result in inhibition of endothelial cell transmigration by 
LNCaP, PC3, 22Rvl, MDA-PCA-2b, or DU145 prostate cancer cells indicate that the corresponding 
gene plays a role in the production or maintenance of the cancerous phenotype in cancerous prostate 
cells. 

20 Example 10: Effect of Gene Expression on Colony Formation 

The effect of gene expression upon colony formation of SW620 cells, SKOV3 cells, MD- 
MBA-23 1 cells, LNCaP cells, PC3 cells, 22Rvl cells, MDA-PCA-2b cells, and DU145 cells can be 
tested in a soft agar assay. Soft agar assays are conducted by first establishing a bottom layer of 2 ml 
of 0.6% agar in media plated fresh within a few hours of layering on the cells. The cell layer is 

25 formed on the bottom layer by removing cells transfected as described above from plates using 0.05% 
trypsin and washing twice in media. The cells are counted in a Coulter counter, and resuspended to 
1 0 6 per ml in media. 1 0 ul aliquots are placed with media in 96-well plates (to check counting with 
WST1), or diluted further for the soft agar assay. 2000 cells are plated in 800 ul 0.4% agar in 
duplicate wells above 0.6% agar bottom layer. After the cell layer agar solidifies, 2 ml of media is 

30 dribbled on top and antisense or reverse control oligo (produced as described in Example 8) is added 
without delivery vehicles. Fresh media and oligos are added every 3-4 days. Colonies form in 10 
days to 3 weeks. Fields of colonies are counted by eye. Wst-1 metabolism values can be used to 
compensate for small differences in starting cell number. Larger fields can be scanned for visual 
record of differences. 

3 5 Those antisense oligonucleotides that result in inhibition of colony formation of SW620 cells 

indicate that the corresponding gene plays a role in production or maintenance of the cancerous 
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phenotype in cancerous colon cells. Those antisense oligonucleotides that inhibit colony formation in 
SKOV3 cells represent genes that play a role in production or maintenance of the cancerous 
phenotype in cancerous breast cells. Those antisense oligonucleotides that result in inhibition of 
colony formation of MDA-MB-23 1 cells indicate that the corresponding gene plays a role in 
5 production or maintenance of the cancerous phenotype in cancerous ovarian cells. Those antisense 
oligonucleotides that inhibit colony formation in LNCaP, PC3, 22Rvl, MDA-PCA-2b, or DU145 
cells represent genes that play a rble in production or maintenance of the cancerous phenotype in 
cancerous prostate cells. 



In order to assess the effect of depletion of a target message upon cell death, LNCaP, PC3, 
22Rvl, MDA-PCA-2b, or DU145 cells, or other cells derived from a cancer of interest, can be 
transfected for proliferation assays. For cytotoxic effect in the presence of cisplatin (cis), the same 
protocol is followed but cells are left in the presence of 2 uM drug. Each day, cytotoxicity is 

1 5 monitored by measuring the amount of LDH enzyme released in the medium due to membrane 

damage. The activity of LDH is measured using the Cytotoxicity Detection Kit from Roche Molecular 
Biochemicals. The data is provided as a ratio of LDH released in the medium vs. the total LDH 
present in the well at the same time point and treatment (rLDH/tLDH). A positive control using 
antisense and reverse control oligonucleotides for BCL2 (a known anti-apoptotic gene) is included; 

20 loss of message for BCL2 leads to an increase in cell death compared with treatment with the control 
oligonucleotide (background cytotoxicity due to transfection). 

Example 12: Functional Analysis of Gene Products Differentially Expressed in Cancer 
The gene products of sequences of a gene differentially expressed in cancerous cells can be 
further analyzed to confirm the role and function of the gene product in tumorigenesis, e.g., in 

25 promoting or inhibiting development of a metastatic phenotype. For example, the function of gene 
products corresponding to genes identified herein can be assessed by blocking function of the gene 
products in the cell. For example, where the gene product is secreted or associated with a cell surface 
membrane, blocking antibodies can be generated and added to cells to examine the effect upon the cell 
phenotype in the context of, for example, the transformation of the cell to a cancerous, particularly a 

30 metastatic, phenotype. In order to generate antibodies, a clone corresponding to a selected gene 

product is selected, and a sequence that represents a partial or complete coding sequence is obtained. 
The resulting clone is expressed, the polypeptide produced isolated, and antibodies generated. The 
antibodies are then combined with cells and the effect upon tumorigenesis assessed. 

Where the gene product of the differentially expressed genes identified herein exhibits 

35 sequence homology to a protein of known function (e.g., to a specific kinase or protease) and/or to a 
protein family of known function (e.g. , contains a domain or other consensus sequence present in a 
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protease family or in a kinase family), then the role of the gene product in tumorigenesis, as well as 
the activity of the gene product, can be examined using small molecules that inhibit or enhance 
function of the corresponding protein or protein family. 

Additional functional assays include, but are not necessarily limited to, those that analyze the 
5 effect of expression of the corresponding gene upon cell cycle and cell migration. Methods for 
performing such assays are well known in the art. 

Example 13: Deposit Information . 

Deposits of the biological materials in the tables referenced below were made with either the 
Agricultural Research Service Culture Collection (NRRL), 1815 North University Street, Peoria, 

10 Illinois 61604, or with the American Type Culture Collection (ATCC), 10801 University Blvd., 

Manasas, VA 201 10-2209, under the provisions of the Budapest Treaty, on or before the filing date of 
the present application. The accession number indicated is assigned after successful viability testing, 
and the requisite fees were paid. Access to said cultures will be available during pendency of the 
patent application to one determined by the Commissioner to be entitled to such under 37 C.F.R. 

15 §1.14 and 35 U.S.C. §122. All restriction on availability of said cultures to the public will be 
irrevocably removed upon the granting of a patent based upon the application. Moreover, the 
designated deposits will be maintained for a period of thirty (30) years from the date of deposit, or for 
five (5) years after the last request for the deposit; or for the enforceable life of the U.S. patent, 
whichever is longer. Should a culture become nonviable or be inadvertently destroyed, or, in the case 

20 of plasmid-containing strains, lose its plasmid, it will be replaced with a viable culture(s) of the same 
taxonomic description. 

These deposits are provided merely as a convenience to those of skill in the art, and are not an 
admission that a deposit is required. A license may be required to make, use, or sell the deposited 
materials, and no such license is hereby granted. The deposit below was received by the ATCC on or 
25 before the filing date of the present application. 



Table 14. Cell Lines Deposited with ATCC 



Cell Line 


Deposit Date 


ATCC Accession No. 


CMCC Accession No. 


KM12L4-A 


March 19, 1998 


CRL-12496 


11606 


Kml2C 


May 15, 1998 


CRL-12533 


11611 


MDA-MB- 
231 


May 15, 1998 


CRL-12532 


10583 


MCF-7 


October 9, 1998 


CRL-12584 


10377 



In addition, pools of selected clones, as well as libraries containing specific clones, were 
30 assigned an "ES" number and a "CMCC" number (both internal references) and deposited with the 
NRRL. Table 15 (inserted before the claims) provides the NRRL Accession Nos. of the clones 
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deposited as librarires named ES219-ES225 (CMCC5471-CMCC5477, respectively) on November 1, 
2001, and of the clones deposited as a library named ES226 (CMCC5478) on November 7, 2001. 

Retrieval of Individual Clones from Deposit of Pooled Clones . Where the biological deposit 

5 is composed of a pool of cDNA clones or a library of cDNA clones, the deposit was prepared by first 
transfecting each of the clones into separate bacterial cells. The clones in the pool or library were then 
deposited as a pool of equal mixtures in the composite deposit. Particular clones can be obtained from 
the composite deposit using methods well known in the art. For example, a bacterial cell containing a 
particular clone can be identified by isolating single colonies, and identifying colonies containing the 

1 0 specific clone through standard colony hybridization techniques, using an oligonucleotide probe or 
probes designed to specifically hybridize to a sequence of the clone insert (e.g., a probe based upon 
unmasked sequence of the encoded polynucleotide having the indicated SEQ ID NO). The probe 
should be designed to have a T m of approximately 80°C (assuming 2°C for each A or T and 4°C for 
each G or C). Positive colonies can then be picked, grown in culture, and the recombinant clone 

1 5 isolated. Alternatively, probes designed in this manner can be used to PCR to isolate a nucleic acid 
molecule from the pooled clones according to methods well known in the art, e.g., by purifying the 
cDNA from the deposited culture pool, and using the probes in PCR reactions to produce an amplified 
product having the corresponding desired polynucleotide sequence. 

Although the foregoing invention has been described in some detail by way of illustration and 

20 example for purposes of clarity of understanding, it is readily apparent to those of ordinary skill in the 
art in light of the teachings of this invention that certain changes and modifications may be made 
thereto without departing from the spirit or scope of the appended claims. Those skilled in the art will 
recognize, or be able to ascertain, using not more than routine experimentation, many equivalents to 
the specific embodiments of the invention described herein. Such specific embodiments and 

25 equivalents are intended to be encompassed by the following claims. 
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Table 8 



SEQ 
ID 


SEQ NAME 


ACCESSION 


GEN BANK DESCRIPTION 


GENBANK 
SCORE 


1 


3538.024.GZ43 504925 


AF047717 


Streptomyces chrysomallus actinomycin 
synthetase II (acmB) gene, complete cds 


1.17E-04 


2 


3538.P11.GZ43 504718 


AF111848 


Homo sapiens PRO0529 mRNA, complete 
cds 


2.00E-06 


3 


3541.A04.GZ43 504975 


X58178 




5.00E-06 


4 


3541.A05.GZ43 504991 


AF190638 


Mus musculus nephrinNPHSl (Nphsl) 


2.00E-06 




3541.A16.GZ43 505167 


AB024689 


Mus musculus gene, exon 3, partial 


6.00E-06 


6 


3541.A23.GZ43 505279 


M14155 


Human insulin-like growth factor (IGF-I) IB 
gene, exon 4 


3.00E-06 


7 


3541.B04.GZ43 504976 


U32801 


Haemophilus influenzae Rd section 116 of 
163 of the complete genome 


1.10E-05 


8 


3541.B17.GZ43 505184 


X89398 


H. sapiens ung gene for uracil DNA- 




9 


3538.G08.GZ43 504661 


AF270390 


Staphylococcus epidermidis strain SRI 
clone step.4045d08 genomic sequence 


3.00E-06 


10 


3538.G17.GZ43 504805 


AC006623 


Caenorhabditis elegans clone C52E2, 


4 00Et06 


11 


3538.G19.GZ43 504837 


AB042425 


Homo sapiens Pim-2h, hUGT2, hUGTl, 
genes for pim-2 protooncogene homolog, 
UDP-galactose transporter 1, UDP-galactose 
transporter 2, complete cds 


6.60E-11 


12 


3538.G22.GZ43 504885 


L08338 


Human inuminodeficiency virus type 1 
proviral envelope glycoprotein gene V3 
region from Al 96/4537, clone 6 (from 


3.10E-07 


13 


3538.H05.GZ43 504614 


AE006731 


Sulfolobus solfataricus section 90 of 272 of 


2.00E-06 


14 


3538.H21.GZ43 504870 


AL121807 


if pornbe chromosome III cosmid cl32 


1.30E-05 


15 


3538.I08.GZ43 504663 


AF186379 


Homo sapiens ligand effect modulator-6 
(LEM6) mRNA, complete cds 


8.00E-10 


16 


3538.I13.GZ43 504743 


AC007658 


Arabidopsis lhaliana chromosome II section 
Sequence from clones F27I1 


3.30E-08 




3538. J22 GZ43 504888 


X04616 


Anacystis nidulans R2 psbAI gene for 


8.90E-07 


18 


3538.K12.GZ43 504729 


X91656 


M.musculus Srp20 gene 


4.40E-05 




3538 K23 GZ43 504905 


M62849 




4 40E-07 


20 


3538.L16.GZ43 504794 


AE001382 


Plasmodium falciparum chromosome 2, 
section 19 of 73 of the complete sequence 


7.00E-06 


21 


3538.M02.GZ43 50457 
1 


U07976 


Human T cell receptor beta (TCRBV7S2, 
TCRBV13S2-1, TCRBV6S7-1) genes, 
TCRBV deleted 2 haplotype, partial cds 


7.00E-06 


22 


3538.M05.GZ43_50461 
9 


AC079878 


Homo sapiens BAC clone RP11-343P21 
from 7, complete sequence 


1.40E-07 


23 


3538.M08.GZ43_50466 
7 


AF182668 


Zenaida galapagoensis beta-fibrinogen gene, 
partial sequence 


4.70E-08 
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Table 8 



SEQ 
ID 


SEQ NAME 


ACCESSION 


GENBANK DESCRIPTION 


GENBANK 
SCORE 


24 


3538.N20.GZ43 504860 


AB033411 


Taenia crassiceps mitochondrial gene for 


6.80E-07 


25 


3538.O07.GZ43 504653 


X68019 


Feline Immunodeficiency Virus GAG gene 


4.00E-06 


26 


3541. Ell. GZ43 505091 


M73447 


Human repeat polymorphism at locus 
D9S59 


3.00E-08 


27 


3541.E14.GZ43 505139 


AJ243419 


Acaulospora trappei partial 18S rRNA, 5.8S 
rRNA and partial 28S rRNA genes and 
internal transcribed spacers 1 and 2 (ITS1, 
ITS2), isolate AU 219 


1.10E-07 


28 


3541.E15.GZ43 505155 


U13679 


Human lactate dehydrogenase-A (LDH-A) 
gene, promoter region 


3.50E-10 


29 


3541.G17.GZ43 505189 


AE004851 


Pseudomonas aeruginosa PA01, section 412 
of 529 of the complete genome 


1.30E-05 


30 


3541.H14.GZ43 505142 


AJ252202 


Drosophila melanogaster D-COQ7 gene for 
putative C007 isologue, exons 1-3 


9.00E-06 


31 


3541.115. GZ43 505159 


X98371 




6.00E-06 


32 


3541.I17.GZ43 505191 


AK023918 


Homo sapiens cDNAFLJ13856 fis, clone 
THYRO1000988 


1.70E-22 


33 


3541.I18.GZ43 505207 


AF329081 


Bos taurus AMP-activated protein kinase 
gamma- 1 (PRKAGl) gene, partial cds 


5.30E-33 


34 


3541.J19.GZ43 505224 


AF002749 


Psvchotriaurceolataribosomal protein S16 
(rpsl6) gene, chloroplast gene encoding 
chloroplast protein, partial intron 


3.01E-03 


35 


3541.K09.GZ43 505065 


AF027607 


Gallus gallus L-type voltage-gated calcium 
channel alphalD subunit ChCaChAlD 
precursor mRNA, complete intron sequence 


9.00E-06 


36 


3541.L19.GZ43 505226 


AE003949 


Xylella fastidiosa 9a5c, section 95 of 229 of 
the complete genome 


2.00E-06 


37 


3541.M02.GZ43_50495 


BC004556 


Homo sapiens, Similar to CG7083 gene 
product, clone MGC: 10534 
EMAGE'3957147 mRNA complete cds 


6 20E-07 


38 


3541.M07.GZ43_50503 
5 


X05616 


Kangaroo rat repetitive DNA with insertion 
sequence 


4.80E-08 


39 


3541.M18.GZ43 50521 
1 


M81888 


Parvovirus Lull DNA sequence 


6.60E-05 


40 


3541.O04.GZ43 504989 


AF081828 


Ixodes hexagonus mitochondrial DNA, 
complete genome 


3.00E-06 


41 


3541.013.GZ43 505133 


AK026465 


Homo sapiens cDNA: FLJ22812 fis, clone 
KAIA2955 


8.00E-06 


42 


3541.023.GZ43 505293 


X54859 


Porcine TNF-aJpha and TNF-beta genes for 
tumour necrosis factors alpha and beta, 
respectively 


2.90E-05 


43 


3541.P05.GZ43 505006 


AE006642 


Sulfolobus solfataricus section 1 of 272 of 
the complete genome 


3.50E-05 


44 


3541.P22.GZ43 505278 


U 10400 


Saccharomyces cerevisiae chromosome VIII 
cosmid L2825 


1.80E-05 


45 


3544.A09.GZ43 505439 


X75677 . 


C.parapsilosis mttRNA genes (591bps) 


3.70E-08 
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Table 8 



SEQ 
ID 


SEQ NAME 


ACCESSION 


GENBANK DESCRIPTION 


GENBANK 
SCORE 




3544 A13 GZ43 505503 


D28811 


Schistosoma japoni cum mRNA for 
paramyosin, complete cds 




47 


3544.A14.GZ43 505519 




Human immunodeficiency virus type 2 


2 90E-05 


48 


3544.A17.GZ43 505567 


L23650 


Caenorhabditis elegans cosmid C27D11, 
complete sequence 


5.60E-07 


49 


3544.B02.GZ43 505328 


AF060543 


mRNA, complete cds 


1.50E-49 


50 


3544.B09.GZ43 505440 


AB051473 


Homo sapiens mRNA for KIAA1686 
protein, partial cds 


1.80E-05 


51 


3544.B18.GZ43 505584 


AJ224821 


genomic sequence 


4.00E-06 


52 


3544.E05.GZ43 505379 


AL451187 


Human DNA sequence from clone RP 11- 
49J23 on chromosome 6, complete sequence 


1 30E-07 


53 


3544.E18.GZ43 505587 


L08338 


Human immunodeficiency virus type 1 
proviral envelope glycoprotein gene V3 
region from Al 96/4537, clone 6 (from 
adult) 


3.30E-07 


54 


3544.F06.GZ43 505396 


X60833 


R.norvegicus TD02 gene for tryptophan 2,3- 
dioxygenase, exon 6 


7.80E-07 


55 


3544.F16.GZ43 505556 




Drosophila melanogaster D3-100EF mRNA, 


2.00E-06 


56 


3544.G06.GZ43 505397 


AC002359 


Homo sapiens Xp22 Cosmid U239B3 (from 
sequence 


1.60E-05 




3544 G10.GZ43 505461 




Cnthidia oncopelti mitochondrial ND4, 
ND5, COI, 12S ribosomal RNA genes for 
NADH dehydrogenase subunit 4/5, 
cytochrome oxidase subunit I and 12S 


4 80E-05 


58 


3544.G11.GZ43 505477 


U80927 


Dictyostelium discoideum unknown protein 
gene, complete cds 


9.00E-08 


59 


3544.G12.GZ43 505493 


AF245483 


Oryza sativa OSE4 (OSE4) gene complete 
cds 


1.70E-07 




3544 H03 GZ43 505350 






2 30E-05 


61 


3544.H15.GZ43 505542 


AF194829 


Tetragonia tetragonioides NADH 
chloroplast gene for chloroplast product 


2.00E-06 


62 


3544.H24.GZ43 505686 


BC008353 


Homo sapiens Similar to RIKEN cDNA 
0610008P16 gene, clone MGC: 15937 
IMAGE:3537224, mRNA, complete cds 


2.50E-18 


63 


3544.I07.GZ43 505415 


AFO 10533 


Plasmodium falciparum microsatellite TA21 
sequence 


1.80E-08 


64 


3544.I15.GZ43 505543 


D29794 


Mouse gene for T cell receptor gamma 


3.00E-06 


65 


3544.I20.GZ43 505623 


AE000677 


Aquifex aeolicus section 9 of 109 of the 
complete genome 


4.00E-06 


66 


3544.J04.GZ43 505368 


Z97015- 


Lactococcus lactis cremoris sucrose gene 
cluster 


1.00E-06 
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Table 8 



SEQ 
ID 


SEQ NAME 


ACCESSION 


GENBANK DESCRIPTION 


GENBANK 
SCORE 


67 


3544.J11.GZ43 505480 


M67480 


Human prothymosin-alpha gene, complete 


5.10E-10 


68 


3544.J13.GZ43 505512 


AJ249884 


Lepeophtheims salmonis micro satellite 
DNA, locus Ls.NUIG.09 


5.70E-08 


69 


3544.J23.GZ43 505672 


AJ245823 


Trypanosoma brucei PK4 gene for protein 
kinase 


6.00E-06 


70 


3544.K16.GZ43 505561 


U18191 


Human HLA class I genomic survey 
sequence 


2.50E-07 


71 


3544.L11.GZ43 505482 


X07127 


Kluyveromyces lactis killer plasmid kl 
DNA 


2.90E-05 


72 


3544.L13.GZ43 505514 


BC005028 


Homo sapiens, hypothetical protein 
FLJ11323, clone MGC: 12582 
IMAGE:3 953383, mRNA, complete cds 


1.80E-31 


73 


3544.M06.GZ43_50540 

3 


AC006687 


Caenorhabditis elegans cosmid T20C7, 
complete sequence 


2.30E-05 


74 


3544.M10.GZ43_50546 
7 


M92378 


Mus musculus GABA transporter mRNA 
sequence 


1.30E-05 


75 


3544.N07.GZ43 505420 


U48705 


Human receptor tyrosine kinase DDR gene, 
complete cds 


7.40E-07 


76 


3544.N12.GZ43 505500 


BC007621 


Homo sapiens, Similar to Orthodenticle 
(Drosophila) homolog 1, clone MGC: 1573 6 
MAGE-.3355563, mRNA, complete cds 


5.70E-07 


77 


3544.N19.GZ43 505612 


AF270077 


Staphylococcus epidermidis strain SRI 
clone step. 1 047 c06 genomic sequence 


2.00E-07 


78 


3544.O03.GZ43 505357 


U15681 


Myrmecia pilosula HI87-156 mitochondrion 
cytochrome b gene, partial cds 


1.00E-06 


79 


3544.O10.GZ43 505469 


AF056032 


Homo sapiens kynurerhne 3 -hydroxylase 
mRNA, complete cds 


5.00E-06 


80 


3544.015.GZ43 505549 


U37373 


Xenopus laevis tail-specific thyroid 
hormone up-regulated (gene 5) mRNA, 
complete cds 


3.00E-06 


81 


3544.O20.GZ43 505629 


D66906 


Bombyx mori DNA for sorbitol 
dehydrogenase, complete cds 


2.00E-06 


82 


3544.P18.GZ43 505598 


J04357 


Red clover necrotic mosaic virus RNA-1, 
complete sequence 


4.00E-06 


83 


3547.A04.GZ43 505743 


AF118558 


Mus musculus hitchhiker-3 Mtchhiker-4, 
and lutchhiker-5 mRNA sequences 


5.40E-07 


84 


3547.A11.GZ43 505855 


U93874 


tsaciilus subtilis cysteine synthase (yrhA), 
cystathionine gamma-lyase (yrhB), YrhC 
(yrhC), YrhD (yrhD) 5 formate 
dehydrogenase chain A (yrhB), YrhF 
(yrhF), formate dehydrogenase (yrhG), 
YrhH (yrhH), regulatory protein (yrhl), 
cytochrome P450 102 (yrhJ),> 


4.00E-06 


85 


3547.A24.GZ43 506063 


AL157466 


Homo sapiens mRNA; cDNA 
DKFZp761E2423 (from clone 
DKPZp761E2423) 


8.80E-07 


86 


3547.C05.GZ43 505761 


X52589 


Bovine rotavirus RNA for virus protein 2 
(VP2) 


1.00E-05 
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Table 8 



SEQ 
ID 


SEQ NAME 


ACCESSION 


GENBANK DESCRIPTION 


GENBANK 
SCORE 


87 


3547.C17.GZ43 505953 


U67594 


150 of the complete genome 


3.80E-05 


88 


3547.C23.GZ43 506049 


AJ250862 


Bacillus sp. HIL-Y85/54728 mersacidin 
biosynthesis gene cluster (mrsK2, mrsR2, 

mrsM and mrsT genes) 


1.20E-05 


89 


3547.D19.GZ43 505986 


AF050491 


Mcrogadus tomcod aromatic hydrocarbon 
receptor (ahr) gene, exons 8-11, partial cds 


4.UUIS-UO 


90 


3547.D23.GZ43 506050 




Rat cytochrome P450 II A3 (CYP2A3) 
gene, complete cds 


^ snp rK 


91 


3547.E04.GZ43 505747 




Homo sapiens (subclone H8 9_dl2 from PI 
35 H5 C8) DNA sequence 


7 70E-07 


92 


3547.F02.GZ43 505716 


AF038190 






93 


3547.F10.GZ43 505844 


AY008833 


Staphylococcus aureus tcaR-tcaA-tcaB 


5 00E-06 


94 


3547.F20.GZ43 506004 


AB037821 


Homo sapiens mRNA for KIAA1400 


1 00E-06 


95 


3547.G02.GZ43 505717 


M88397 


Naegleria fowled virulence-related protein 
(NF314) mRNA, complete cds 


3.70E-07 


96 


3547.G09.GZ43 505829 


AJ3 15644 


inositol symporter (Hmit gene) 


7.90E-07 


97 


3547.G22.GZ43 506037 


233603 


P radiata (Prl 6) microsatellite DNA 703bp 


1 70E-07 


98 


3547H12GZ43 505878 




Shigella flexneri ipgD, ipgE, ipgF genes, 


i nnp r\6 




3547 H14 GZ43 505910 


AL137502 


Homo sapiens mRNA; cDNA 
DKFZp761H171 (from clone 
DKFZp761H171)' partial cds 


z. yuii-u / 




3547 107 GZ43 505799 




B.sphaericus ermG gene encoding rRNA 
methyltransferase (macroUde-lincosamide- 
streptogramin B resistance element) 


/.UUli-UD 


101 


3547.I16.GZ43 505943 


AF015157 


Homo sapiens clone HS19.12 Alu-Ya5 
sequence 


4.70E-10 


102 


3547. 117. GZ43 505959 


AE007758 


Clostridium acetobutylicum ATCC824 


3 00E-06 


103 


3547.I20.GZ43 506007 


L37606 


Medicago sativa (clone GG16-1) NADH- 
complete cds 


1.50E-05 


104 


3547.J05.GZ43 505768 


Z16911 


H. sapiens (D20S113) DNA segment 
containing (CA) repeat; clone AFM205th8; 
single read 


2.80E-07 


105 


3547.J10.GZ43 505848 


Z37803 


HIV-1 DNA V3 region (patient 15, sample 
CSF, clone 9 ) 


8.80E-07 


106 


3547.J20.GZ43 506008 


AF013273 


Candida albicans histidine kinase 1 gene, 
complete cds 


3.30E-05 


107 


3547.J22.GZ43 506040 


AF289080- 


Lycopersicon esculentum alpha- 
galactosidase gene, partial cds 


4.00E-06 
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SEQ 
ID 


SEQ NAME 


ACCESSION 


GENBANK DESCRIPTION 


GENBANK 
SCORE 


108 


3547.K01.GZ43 505705 


AF267863 


Homo sapiens DC43 mRNA, complete cds 


7.30E-22 


109 


3547.L09.GZ43 505834 


Z22175 


Caeiiorliab ditis elegans cosmid K01F9, 
complete sequence 


1.40E-05 


110 


3547.L11.GZ43 505866 


AJ288648 


Limnodynastes tasmaniensis mitochondrial 
partial nadh4 gene for NADH 
dehydrogenase subunit 4 and partial tRNA- 
His gene, sample 26 from Australia.Boolara 


5.90E-07 


111 


3547.L16.GZ43 505946 


AE001293 


Chlamydia trachomatis section 20 of 87 of 
the complete genome 


7.10E-07 


112 


3547.L22.GZ43 506042 


AF287006 


Danio rerio T-box brain 1 mRNA, partial 
cds 


7.00E-06 


113 


3547.M02.GZ43_50572 
3 


AE007788 


Clostridium acetobutylicum ATCC824 
section 276 of 356 of the complete genome 


1.00E-05 


114 


3547.M07.GZ43_50580 
3 


Z46252 


M.muscuhis DNA for region surrounding 
retrovirus restriction locus Fvl 


6.00E-06 


115 


3547.M08.GZ43_50581 
9 


AB020684 


Homo sapiens mRNA for KIAA0877 
protein, partial cds 


1.50E-05 


116 


3547.M16.GZ43_50594 
7 


AF335240 


Petunia x hybrida MADS-box transcription 
factor FBP22 (FBP22) mRNA, complete cds 


3.00E-06 


117 


3547.N06.GZ43 505788 


AF299346 


Kemspora ilavissima isolate <JjbSH313 18S 
ribosomal RNA gene, partial sequence; 
internal transcribed spacer 1, 5.8S 
ribosomal RNA gene and internal 
transcribed spacer 2, complete sequence; 
and 28S ribosomal RNA gene, partial 
sequence 


1.70E-08 


118 


3 547.003. GZ43 505741 


AE002344 


Chlamydia muridarum, section 72 of 85 of 


6.60E-07 


119 


3547.O07.GZ43 505805 


D50608 


Rat gene for cholecystoldnin type-A 
receptor (CCKAR), complete cds 


1 60E-05 


120 


3547.014.GZ43 505917 


AL137502 


Homo sapiens mRNA; cDNA 
DKFZp761H171 (from clone 
DKFZp761H171); partial cds 


2.90E-07 


121 


3547.P18.GZ43 505982 


AJ131734 


Plasmodium berghei DNA including 
upstream sequence NTS and 5'ETS of the 
18S rRNA gene (A rRNA gene unit) 


6.10E-07 


122 


3547.P21.GZ43 506030 


AC006619 


Caenorhabditis elegans cosmid C46C11, 
complete sequence 


1.70E-05 


123 


3547.P22.GZ43 506046 


AJ000871 


Streptococcus mitis comC, comD, comE 
genes, isolate B 5 


2.00E-06 


124 


3550.A12.GZ43 506255 


M22310 


Human epidermal growth factor receptor 
proto-oncogene downstream enhancer 


4.80E-07 


125 


3550.A16.GZ43 506319 


L39435 


Senecio mikanioides chloroplast NADH 
dehydrogenase (ndhF) gene, complete cds 


2.00E-06 


126 


3550.B06.GZ43 506160 


D14161 


Hordeumvulgare ids-4 mRNA, complete 

cds 


1.10E-08 
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ID 
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GENBANK 
SCORE 


127 


3550.C01.GZ43 506081 


AK025182 


Homo sapiens cDNA: FLJ21529 fis, clone 
COL05981 


4.20E-09 


128 


3550.C22.GZ43 506417 


X52028 


Rattus norvegicus P450 UD3 gene 


1.41E-04 


129 


3550.D16.GZ43 506322 


Y10345 


H. sapiens GalNAc-T3 gene, 3'UTR 


5.00E-07 


130 


3550.D23.GZ43 506434 


AF134403 


Escherichia coh' plasmid pAA2 Shf (shf), 
hexosyltransferase homolog (capU), and 
VirK (virK) genes, complete cds 


6.90E-07 


131 


3550.E02.GZ43 506099 


U66074 


Tritrichomonas foetus putative superoxide 
dismutase 2 (SOD2) gene, complete cds 


8.90E-07 


132 


3550.E06.GZ43 506163 


Z23341 


H. sapiens (D8S528) DNA segment 
containing (CA) repeat; clone AFM080xh7; 
single read 


2.30E-08 


133 


3550.F06.GZ43 506164 


M59447 


Drosophila melanogaster Sex-lethal (Sxl) 
mRNA, complete cds 


3.00E-06 


134 


3550.F08.GZ43 506196 


M24901 


R^bit pulmonary surfactant-associated 
protein (SP-B) mRNA. complete cds 


3.40E-07 


135 


3550.F20.GZ43 506388 


AF216169 


Simicratea welwitschii clone 2 phytochrome 
B (PHYB) gene, exon 1 and partial cds 


5.40E-08 


136 


3550.F22.GZ43 506420 


AP000739 


Arabidopsis thaliana genomic DNA, 
chromosome 3, PI clone:MEK6 


2.20E-05 


137 


3550.G02.GZ43 506101 


AL022342 


Human DNA sequence from clone RP1- 
29M10 on chromosome 20, complete 
sequence [Homo sapiensl 


7.40E-05 


138 


3550.G08.GZ43 506197 


AK021312 


Mas museums 13 days embryo stomach 
cDNA, PJKEN full-length enriched library, 
clone:D530039A21, full insert sequence 


3.60E-08 


139 


3550.G10.GZ43 506229 


M31684 


D. melanogaster cytoskeleton-like bicaudalD 
protein (BicD) mRNA. complete cds 


3.00E-06 


140 


3550.G15.GZ43 506309 


AF087141 


Mus museums uncharacterized long 
terminal repeat, complete sequence; and 
valyl-tRNA synthetase (G7a) gene, complete 
cds 


4.00E-06 


141 


3550.G23.GZ43 506437 


X02547 


Trypanosoma brucei mitochondrial genes 
for 12S and 9S ribosomal RNA 


2.00E-06 


142 


3550.H10.GZ43 506230 


U55711 


Human ataxia-telangiectasia (ATM) gene, 
exon 11 


6.10E-08 


143 


3550.H21.GZ43 506406 


Z68755 


Human DNA sequence from cosmid 
LI 18D5, Huntington's Disease Region, 
chromosome 4p 1 6.3 


2.00E-06 


144 


3550.H23.GZ43 506438 


AF151388 


Dermatobia hominis strain Alienas tRNA- 
Ile gene, partial sequence; D-loop, complete 
sequence; and 12S ribosomal RNA, partial 
sequence; mitochondrial genes for 
mitochondrial products 


1.20E-07 
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145 


3550.I03.GZ43 506119 


AF1 17258 


Staphylococcus aureus plasmid pIP680 
replication protein RepE (repE) gene, 
partial cds; resolvase (res), acetyltransferase 

complete cds; and unknown gene 


6.50E-08 


146 


3550.I19.GZ43 506375 


AE002781 


Drosophila melanogaster genomic scaffold 
142000013385442, complete sequence 


3.90E-05 


147 


3550.I21.GZ43 506407 


AE001002 


Archaeoglobus fulgidus section 105 of 172 


4.20E-05 


148 


3550.J05.GZ43 506152 


AF080689 


Homo sapiens protein kinase PITSLRE 
(CDC2L2) gene, exons 8 and 9 


5.50E-10 


149 


3550J11.GZ43 506248 


Z82761 


R.prowszekii genomic DNA fragment 
(clone A793R) 


1.00E-06 


150 


3550.K05.GZ43 506153 


X15407 


Maize pseudo-Gpa2 pseudogene for 
glyceraldehyde-3 -pliosphate dehydrogenase 
subunit A 


3.20E-05 




3550 K09 GZ43 506217 


— X62631 — 




1.50E-07 




3550 K14.GZ43 506297 


M59743 


Rabbit cardiac muscle Ca-2+ release 
channel (ryanodine receptor) mRNA, 


1.00E-06 




3550 L16 GZ43 506330 


AF201383 


Buclmera aphidicola isopropylmalate 


1.00E-06 


154 


3550.L19.GZ43 506378 


M77244 


H.sapiens erythropoietin receptor (EPOR) 
gene. 5' end 


4.00E-09 










8.00E-06 




3550.M21.GZ43_50641 


M87339 


Human replication factor C, 37-kDa subunit 


5.00E-06 


157 


3550.N01.GZ43 506092 


AF191009 


Helicobacter pylori strain CliinaF30A cag 
type Ilia motif 


1.10E-07 


158 


3550.N07.GZ43 506188 


AF235499 


Mus musculus SH2-containing inositol 5- 
phosphatase (Ship) gene, exons 3 through 6 


1.55E-04 


159 


3550.003 .GZ43 506125 


D14813 


Human DNA for osteopontin, complete cds 


4.50E-05 


160 


3550.O04.GZ43 506141 


U08596 


channel mRNA, partial cds 


6.00E-06 


161 


3550.O08.GZ43 506205 


XM 017044 


Homo sapiens similar to diaphanous 
(Drosophila, homolog) 2 (H. sapiens) 
(LOC91459), mRNA 


6.40E-09 


162 


3550.O15.GZ43 506317 


U15977 


Mus musculus long chain fatty acyl CoA 
synthetase mRNA, complete cds 


2.80E-05 


163 


3550.O17.GZ43 506349 


X62578 


C.caldariumplastid genes ompR', psbD, 
psbC, rpsl6 and groEL 


2.80E-05 


164 


3550.O18.GZ43 506365 


L34363 


Human X-hnked nuclear protein (XNP) 
gene, complete cds 


4.00E-06 


165 


3550.O21.GZ43 506413 


AB056784 


Macaca fascicularis brain cDNA 
clone:QnpA-11501, full insert sequence 


5.20E-07 
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166 


3550.P18.GZ43 506366 


AK002041 


PLACE1007450 


5.30E-07 


167 


3550.P23.GZ43 506446 


AF200361 


(Cyp4Fl) gene, complete cds 


1.40E-05 


168 


3553.A09.GZ43 506591 


AL 109980 


Human DNA sequence from clone RP4- 
sequmcTlH^rm'sap^s 2 ] 25 C ° inplete 


3 50E-12 


169 


3553.B07.GZ43 506560 


L37056 


Strongylocentrotus purpuratus myc protein 
mRNA, complete cds 


4.60E-07 


170 


3553.B16.GZ43 506704 


U43542 


Nicotiana tabacum diphenol oxidase 
mRNA, complete cds 


2.00E-06 


171 


3553.B22.GZ43 506800 


L34040 


Homo sapiens stromelysin gene, promoter 
region 


6.00E-06 


172 


3553.D04.GZ43 506514 


Y07599 




9 40E-07 


173 


3553.D07.GZ43 506562 




RnorTOgic^Caj^I gene exons 3 4 & 5 


2 00E-06 








Bacillus subtilis dihydropicolinate reductase 
(j ojE) gene, complete cds; poly(A) 
polymerase (jojl) gene, complete cds; biotin 

complete cds; jojC, jojD, jojF, jojG, jojH 
genes, complete cds's 


l.oUJti-uJ 


175 


3553.D19.GZ43 506754 


X53431 


Yeast gene for STE11 


9.00E-06 




3553 E08 GZ43 506579 


AF062863 


Arabidopsis tlialiana putative transcription 


1 80E-07 




3553.E09 GZ43 506595 




XJaevis XFG 5-1 and XFG 5-2 genes for 


6 60E-05 


178 


3553.F12.GZ43 506644 


X63223 


B.taurus CI-MNLL mRNA for ubiquinone 


6.90E-08 


179 


3553.F13.GZ43 506660 




Homo sapiens (subclone l_c4 from PI H55) 


3 00E-08 




3553 F19 GZ43 506756 


U97190 


Caenornabditis elegans cosmid B0025, 


3 00E-06 




3553 G05 GZ43 506533 




beta -HKA=H,K-ATPasebeta-subunit [rats, 


8 00E-06 




3553 G06 GZ43 506549 


X68048 


Phaseolus vulgaris chloroplast DNA for 


5 00E-06 




3553 G07 GZ43 506565 


AF068289 


Homo sapiens HDCMD34P mRNA, 




184 


3553.G21.GZ43 506789 


Z33603 


P.radiata (Prl.6) microsatellite DNA, 703bp 


1.70E-07 


185 


3553.H06.GZ43 506550 


AF090901 


Homo sapiens clone HQ0195$ PRO0195 
mRNA, complete cds 


8.00E-07 


186 


3553.H09.GZ43 506598 


AF270105 


Staphylococcus epidermidis strain SRI 
clone step.l049c09 genomic sequence 


9.80E-07 


187 


3553.H21.GZ43 506790 


Z18359 


Glycine max seed-specific low molecular 
weight sulfur-rich protein 


2.00E-06 


188 


3553.I13.GZ43 506663 


AF155115 


Homo sapiens NY-REN-58 antigen mRNA, 
complete cds 


1.70E-07 
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189 


3553.I16.GZ43 506711 


AF270229 


Staphylococcus epidermidis strain SRI 


1 20E-05 


190 


3553J12.GZ43 506648 


U53400 


Rattus norvegicus chromosome 10 
microsatelhte sequence D10Mco21 


1.55E-01 


191 


3553.J14.GZ43 506680 


M10014 


Homo sapiens map 4q28 fibrinogen (FGG) 
gene, alternative splice products, complete 
cds 


9.00E-06 


192 


3553J16.GZ43 506712 


Z23341 


H. sapiens (D8S528) DNA segment 
containing (CA) repeat; clone AFM080xh7; 
single read 


2.30E-08 


193 


3553J17.GZ43 506728 


AK021312 


Mas muscuhis 13 days embryo stomach 
cDNA, RIKEN full-length enriched library, 
clone:D530039A21, full insert sequence 


3.60E-08 


194 


3553.J22.GZ43 506808 


AF120279 


Mus musculus proline dehydrogenase 
mRNA, complete cds 


5.00E-06 


195 


3553.J24.GZ43 506840 


Z18359 


Glycine max seed-specific low molecular 
weight sulfur-rich protein 


2.00E-06 


196 


3553.K01.GZ43 506473 


U31465 


Kluyveromyces lactis telomerase RNA 
component (TER1) gene complete sequence 


2.00E-06 


197 


3553.K02.GZ43 506489 


X60672 


M.musculus mRNA for radixin 


1.00E-06 


198 


3553.K03.GZ43 506505 


Z71943 


Ghyalina (92-89) DNA for internal 
transcribed spacer 1 


1.06E-02 


199 


3553.K05.GZ43 506537 


M80596 


Saccharomyces cerevisiae VAC1 gene 
(required for vacuole inheritance and 
vacuole protein sorting), complete cds 


6.00E-06 


200 


3553.K07.GZ43 506569 


AJ275317 


Cicer arietinum partial mRNA for malate 
dehydrogenase 


7.60E-07 


201 


3553.K15.GZ43 506697 


X57377 


Mouse dilute myosin heavy chain gene for 
novel heavy chain with unique C-terminal 
region 


2.40E-05 


202 


3538.A11.GZ43 504703 


Z75199 


S. cerevisiae chromosome XV reading frame 
ORFYOR291w 


6.00E-06 


203 


3538.A24.GZ43 504911 


AF270077 


Staphylococcus epidermidis strain SRI 
clone step.l047c06 genomic sequence 


2 00E-07 


204 


3538.B01.GZ43 504544 


AF368255 


Arabidopsis thaliana small zinc finger-like 
protein TIM13 mRNA, complete cds; 
nuclear gene for mitochondrial product 


4.10E-07 


205 


3538.B20.GZ43 504848 


AB069994 


Macaca fascicularis testis cDNA clone:QtsA 
10636, full insert sequence 


1.40E-07 


206 


3538.C01.GZ43 504545 


AF072375 


Pseudoalteromonas sp. S9 beta- 
hexosaminidase (chiP) gene, complete cds 


1.50E-04 


207 


3538.C02.GZ43 504561 


AJO 11271 


Human immunodeficiency virus type 2 
aartial env gene, isolate bl286 


7.10E-08 


208 


3538.D06.GZ43 504626 


AF100765 


Oryza sativa receptor-like kinase (8ARK1) 
fjene, complete cds 


3.00E-06 


209 


3538.D09.GZ43 504674 


Z47784 


VI. musculus mRNA expressed in islet cells 
clone 58) 


3.40E-08 
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210 


3538.D21.GZ43 504866 


AE006568 


Streptococcus pyogenes Ml GAS strain 
SF370, section 97 of 167 of the complete 
genome 


6.00E-07 


211 


3538.E15.GZ43 504771 


AB027966 


Schizosaccharomyces pombe gene for 
clone:TB89 


2.30E-08 




3538.F02.GZ43 504564 


AK001871 


Homo sapiens cDNAFLJ11009 fis, clone 
PLACE1003108 


5.90E-09 


213 


3538.F08.GZ43 504660 


U39161 


Human phosphodiesterase (PDEA) gene, 


7.40E-07 


214 


3553.K23.GZ43 506825 


Y12052 


Homo sapiens gene encoding guanine 


3.00E-06 




3553 K24 GZ43 506841 


AP001419 


Homo sapiens genomic DNA, chromosome 
21q22.2, clone:PAC24K9, LB7T-ERG 


1.00E-06 


216 


3553.L02.GZ43 506490 


X15028 


Chicken hsp90 gene for 90 kDa-heat shock 
protein 5 '-end 


3.60E-05 


217 


3553.L04.GZ43 506522 


L34665 


subunit (HKB) gene, exon 6 


1.20E-09 


218 


3553.L21.GZ43 506794 


M87843 


gene, 5' end 


2.30E-05 




3553.M12.GZ43_50665 


AB026592 


Limnoporus esakii mitochondrial gene for 


1.10E-07 


220 


3553 M23 GZ43 50682 
7 


AE006349 


Lacto lac* b la ^1403 
section 1 1 1 of 2 18 of the complete genome 


8.00E-07 


221 


3553.N01.GZ43 506476 


U80457 


mRNA, complete cds 


2.00E-06 


222 


3553.N02.GZ43 506492 


U81144 


Caenorhabditis elegans non-alpha nicotinic 
(unc-29) gene, complete cds 


3.20E-07 


223 


3553.N04.GZ43 506524 


AE006296 


section 58 of 218 of the complete genome 


2.00E-06 




3553 N07 GZ43 506572 


AK026258 


Homo sapiens cDNA: FLJ22605 fis, clone 
HSI04743 


1.00E-06 




3553 N08 GZ43 506588 


U05822 




1 90E-14 


226 


3553.O07.GZ43 506573 


X97196 


D.melanogasterX gene 


4.00E-06 


227 


3553.018.GZ43 506749 


AE001146 


Borrelia burgdorferi (section 32 of 70) of 
the complete genome 


1.60E-05 


228 


3553. 023. GZ43 506829 


X63509 


M/us musculus partial LI gene, exons 2-4 


6.00E-06 


229 


3553.P03.GZ43 506510 


M24901 


Rabbit pulmonary surfactant-associated 
protein (SP-B) mRNA, complete cds 


4.20E-07 


230 


3553.P05.GZ43 506542 


AF239156 


Homo sapiens peptide deformylase-like 
protein mRNA, complete cds 


1.00E-06 


231 


3553.P12.GZ43 506654 


AF283753 


Acipenser persicus isolate cw203 
cytochrome b gene, partial cds; 
mitochondrial gene for mitochondrial 
product 


3.90E-07 
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232 


3553.P18.GZ43 506750 


AK021312 


Mus xnusculus 13 days embryo stomach 
cDNA, RIKEN full-length enriched library, 
clone:D530039A21, full insert sequence 


3.50E-08 


233 


3553.P21.GZ43 506798 


AB044136 


Homo sapiens genomic DNA clone:#7 


4.00E-06 


234 


3556.A03.GZ43 506879 


X61084 


C.griseus rhodopsin gene for opsin protein 


4.30E-05 


235 


3556.A06.GZ43 506927 


L46904 


Homo sapiens (subclone 4 c6 from PI H22) 
DNA sequence 


1.20E-08 


236 


3556.B06.GZ43 506928 


AK002041 


Homo sapiens cDNAFLJ11179 fis, clone 
PLACE1007450 


1.40E-07 


237 


3556.B09.GZ43 506976 


U88832 


Human groucho protein homolog (AES) 
gene, exons 2-7 and complete cds 


7.00E-07 


238 


3556.B10.GZ43 506992 


M11925 


Influenza A/clucken/Pennsylvania/8125/83 
(H5N2) neuraminidase (NA) gene, complete 
cds 


5.00E-06 


239 


3556.B14.GZ43 507056 


Z80218 


Caenorhabditis elegans cosmid F52D4, 
complete sequence 


2.20E-05 


240 


3556.C13.GZ43 507041 


AF348512 


Mus musculus polyamine-modulated factor- 
1 gene, exons 2 through 5 and complete cds 


8.00E-06 


241 


3556.C15.GZ43 507073 


X82013 


S.cerevisiae mRNA for SUL1 


3.00E-06 


242 


3556.C18.GZ43 507121 


Z23973 


H. sapiens (D7S660) DNA segment 
rantaimng (CA) repeat; clone AFM277vd5; 
single read 


5.00E-06 


243 


3556.C24.GZ43 507217 


AE001381 


Plasmodium falciparum chromosome 2, 
section 18 of 73 of the complete sequence 


6.90E-07 


244 


3556.D15.GZ43 507074 


U48288 


Rattus norvegicus A-kinase anchoring I 
protein AKAP 220 mRNA complete cds 


5.50E-07 


245 


3556.D20.GZ43 507154 


AF092684 


Neochlamisus scabripennis haplotype 113 
cytoclirome oxidase I (COI) gene, 
mitochondrial gene encoding mitochondrial 
protein, partial cds 


4.00E-07 


246 


3556.D23.GZ43 507202 


X16416 


Human c-abl mRNA encoding pl50 protein 


2.25E-04 


247 


3556.E13.GZ43 507043 


AL049948 


Homo sapiens mRNA; cDNA 
DKFZp564K0222 (from clone 
DKFZp564K0222) 


6.60E-08 


248 


3556.E24.GZ43 507219 


Z57634 


H.sapiens CpG island DNA genomic Msel 
fragment clone 187e9, forward read 
cpgl87e9.ftla 


8.70E-07 


249 


3556.F10.GZ43 506996 


AF025409 


Homo sapiens zinc transporter 4 (ZNT4) 
mRNA, complete cds 


3.90E-34 


250 


3556.G15.GZ43 507077 


X15407 


Maize pseudo-Gpa2 pseudogene for 
glyceraldehyde-3 -phosphate dehydrogenase 
subunit A 


3.40E-05 


251 


3556.H01.GZ43 506854 


AF269443 


Staphylococcus epidermidis strain SRI 
clone step.!003h04 genomic sequence 


3.00E-06 
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252 


3556.H02.GZ43 506870 


U31465 


Kluyveromyces lactis telomerase RNA 
component (TER1) gene, complete sequence 


3.00E-06 


253 


3556.H12.GZ43 507030 


Z68886 


Human DNA sequence from cosmid 
L21F12, Huntington's Disease Region, 
chromosome 4pl6.3 


1.70E-07 


254 


3556.H20.GZ43 507158 


AB034628 


Equus cab alius microsatellite TKY3 19, 
TKY320 DNA 


1.70E-07 


255 


3556.I02.GZ43 506871 


AL3 90767 


Human DNA sequence from clone RP1- 
68P15 on chromosome llpl3-14.2 Contains 
GSSs and ESTs. Contains part of a novel 
gene, complete sequence [Homo sapiensl 


2.00E-06 


256 


3556.I14.GZ43 507063 


U34042 


Mus museums mammalian tolloid-like 
protein mRNA, complete cds 


1.50E-05 


257 


3556.J05.GZ43 506920 


U31465 


Kluyveromyces lactis telomerase RNA 
component (TER1) gene, complete sequence 


2.00E-06 


258 


3556..I07.GZ43 506952 


AL359621 


Homo sapiens mRNA; cDNA 
DKFZp434M1631 (from clone 
DKFZp434M1631) 


2.00E-06 


259 


3556.J14.GZ43 507064 


M81830 


Human somatostatin receptor isoform 2 
(SSTR2) gene, complete cds 


1.00E-06 


260 


3556J16.GZ43 507096 


Y15484 


Canis familiaris gene encoding retinal 
guanylate cyclase E 


2.90E-08 


261 


3556.K04.GZ43 506905 


U88832 


Human groucho protein homolog (AES) 
gene, exons 2-7 and complete cds 


8.00E-07 


262 


3556.K12.GZ43 507033 


AP001419 


Homo sapiens genomic DNA, chromosome 
2lq22.2, clone:PAC24K9, LB7T-ERG 
region, complete sequence 


1.00E-06 


263 


3556.K13.GZ43 507049 


AK023589 


Homo sapiens cDNA FLJ13527 fis, clone 
PLACE1006076 


2.00E-06 


264 


3556.K17.GZ43 507113 


X71634 


D.bifasciata P-Transposon 


3.00E-06 


265 


3 556X08. GZ43 506970 


X02367 


Glaucoma chattoni rDNA 3 ' NTS 


8.20E-08 


266 


3556.L09.GZ43 506986 


AF154329 


Pisum sativum MAP kinase PsMAPK2 
(Mapk2) mRNA, complete cds 


4.10E-07 


267 


3556.L16.GZ43 507098 


AB041791 


Homo sapiens HSPDE10A gene for 
phosphodiesterase 10A1 (PDE10A1), exon 
17 


3.10E-08 


268 


3556.L23.GZ43 507210 


M23720 


Rat cafboxypeptidase (CA2) gene, exon 10 


5.00E-06 


269 


3556.M02.GZ43_50687 
5 


U91963 


Human tolloid-like protein (TLL) mRNA 
complete cds 


1.40E-05 


270 


3556.M11.GZ43_50701 
9 


X16353 


R.rickettsii ompB gene for outer membrane 
protein B 


7.60E-05 


271 


3556.M23.GZ43_50721 
1 


X93496 


H.sapiens TRAP gene, 5' flanking region 


5.60E-23 


272 


3556.N02.GZ43 506876 


U26458 


Snakehead retrovirus (SnRV), complete 
genome 


3.20E-05 


273 


3556.N04.GZ43 506908 


L39064 


Homo sapiens interleukin 9 receptor 
precursor (IL9R) gene, complete cds 


4.00E-09 
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274 


3556.N05.GZ43 506924 


M63437 




2 00E-06 


275 


3556.N06.GZ43 506940 


AF327424 


Arabidopsis thaliana unknown protein 
(T14P1.19/At2g45010) mRNA, partial cds 


2.00E-07 


276 


3556.N21.GZ43 507180 


AB022157 


Mus musculus Cctd gene for chaperonin 
containing TCP- 1 delta subunit, complete 


4.00E-06 


111 


3556.O08.GZ43 506973 


X00171 


Vibrio cholera toxin (ctx) operon DNA 


7 00E-06 


278 


3556.013.GZ43 507053 


U41106 


Caenorliabditis elegans cosmid W06A1 1 . 


1.20E-05 


279 


3556.P07.GZ43 506958 


M15085 


T.bmcei expressed copy of the ILTat 1.3 




280 


3559.A04.GZ43 507279 


AE006824 


Sulfolobus solfataricns section 183 of 272 of 


4 70E-05 


281 


3559.A20.GZ43 507535 


X71787 


A.thaliana AAP2 mRNA for amino acid 


2 00E-06 


282 


3559.A24.GZ43 507599 


X56494 


Haptens M gene for Ml-tvpe and M2-type 
vate kinase 


1 80E-05 


283 


3559.B04.GZ43 507280 


AJ251550 


Homo sapiens partial AK155 gene for 

AK155 protein, exons 1-3 and ioined CDS 


2.50E-05 


284 


3559.B06.GZ43 507312 


AF077344 


lectin (CLECSF1) gene, exons 1 and 2 


5.80E-05 


285 


3559.B08.GZ43 507344 


D50552 


Xenopus laevis xSoxl2 mRNA for 
XSOX12, complete cds 


4.00E-07 


286 


3559.B10.GZ43 507376 


L76259 


Homo sapiens PTS gene, complete cds 


9.00E-06 


287 


3559.B18.GZ43 507504 


M29109 


D.discoideum actiii M6 gene, 5' flank 


3.40E-07 


288 


3559.C06.GZ43 507313 


X99910 




] .60E-05 


289 


3559.D21.GZ43 507554 


AK022877 


Homo sapiens cDNA FLJ12815 As, clone 
NT2RP2002546 


2 00E-06 


290 


3559.E06.GZ43 507315 


U97408 


Caenorhabditis elegans cosmid F48A9 


3.00E-06 


291 


3559.E09.GZ43 507363 


L40489 


Ureaplasma urealyticTrm UreA (ureA), UreB 
(ureB), UreC (ureC), UreE (ureE), UreF 
(ureF), and UreG (ureG) genes, complete 
cds; UreD (ureD) gene, partial cds; and 


3.00E-07 


292 


3559.E20.GZ43 507539 


AF113521 


Zea mays putative transcription factor 
mRNA sequence 


8.20E-08 




3559.F07.GZ43 507332 


AF109377 


Mus musculus ldlBp (LDLB) mRNA 


4 30E-05 


294 


3559 F17 GZ43 507492 


U11292 


Human Ki nuclear autoantigen mRNA, 


6 40E-07 


295 


3559.H09.GZ43 507366 


X13414 


Murine I gene for MHC class II(Ia) 
associated invariant chain 


9.00E-06 


296 


3559.H22.GZ43 507574 


U61402 


Streptococcus thermophilus GalR (galR), 
galactokinase (galK) and gal-l-P 
uridylyltransferase (galT) genes, complete 
cds 


1.00E-06 


297 


3559.H24.GZ43 507606 


■ U67594 


Metlianococcus jaimascliii section 136 of 
150 of the complete genome 


3.40E-05 


298 


3559.I05.GZ43 507303 


X97289 


S.salar genes encoding alpha-globin and 
beta-globin, clone 6 


7.00E-06 
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299 


3559.J04.GZ43 507288 


L10709 


Human constitutive endothelial nitric oxide 
synthase gene, exons 25 and 26 and 
complete cds 


8.90E-12 


300 


3559.J20.GZ43 507544 


U67559 


Methanococcus jannaschii section 101 of 
150 of the complete genome 


5.70E-05 


301 


3559.K16.GZ43 507481 


Z48955 


D.virginiana partial LINE-1 repetitive DNA 
and putative RT 


2.40E-08 


302 


3559.K17.GZ43 507497 


AC004497 


Homo sapiens chromosome 21, PI clone 
LBNL#6 (LBNL H10), complete sequence 


4.00E-06 


303 


3559.L0LGZ43 507242 


X58774 


Herpesvirus sairniri sRNAl, sRNA2, 
sRNA3 and sRNA4 genes for small viral 
RNAs 


1.00E-06 


304 


3559.L14.GZ43 507450 


X67774 


C.upsaliensis (LMG 8854) 23 S rRNA gene 


1.30E-05 


305 


3559.L19.GZ43 507530 


Z57634 


Hsapiens CpG island DNA genomic Msel 
fragment, clone 187e9, forward read 
cpgl87e9.ftla 


7.70E-07 


306 


3559.M02.GZ43J0725 
9 


AF042834 


Homo sapiens phosphodiesterase delta 
subunit gene, exons 2, 3 and 4 


1.30E-05 


307 


3559.M09.GZ43_50737 
1 


U07628 


Caenorhabditis elegans N2 APX-1 (apx-1) 
mRNA, complete cds 


2.00E-06 


308 


3559.N05.GZ43 507308 


Z24259 


II. sapiens (D19S417) DNA segment 
containing (CA) repeat; clone AFM304zgl; 
single read 


3.70E-07 


309 


3559.N18.GZ43 507516 


S75829 


{dinucleotide repeats, microsatellite 
marker} [Dryobalanops lanceolata, 
Genomic, 230 nt] 


1.90E-07 


310 


3559.N21.GZ43 507564 


AL353948 


Homo sapiens mRNA; cDNA 
DKFZp761P0114 (from clone 
DKFZp761P0114) 


5.30E-07 


311 


3559.O01.GZ43 507245 


AL1 10269 


Homo sapiens mRNA; cDNA 
DKFZp564A122 (from clone 
DKFZp564A122); partial cds 


1.60E-17 


312 


3559.O05.GZ43 507309 


Y08695 


Clostridium tertium nanH gene 


7.40E-07 


313 


3559.O07.GZ43 507341 


AJ249489 


Xenopus laevis partial mRNA for putative 
olfactory receptor (xb6 gene) 


5.40E-07 


314 


3559.O20.GZ43 507549 


X02886 


Human gene for T-cell receptor alpha chain 
J region 


2.00E-06 


315 


3559.P10.GZ43 507390 


X66030 


Homo sapiens partial ufo gene encoding 
tyrosine kinase receptor 


4.90E-07 


316 


3559.P15.GZ43 507470 


Z16777 


H. sapiens (D2S139) DNA segment 
containing (CA) repeat; clone AFM177xh4; 
single read 


4.00E-06 


317 


3559.P18.GZ43 507518 


AJ228072 


Nicotiana benthamiana DNA for Tntl 
retrotransposable element, isolate benl5 


2.80E-07 


318 


3559.P24.GZ43 507614 


U32372 


Rattus norvegicus tyro sine-ester 
sulfotransferaseinRNA, complete cds 


4.90E-07 


319 


3562.A01.GZ43 507615 


AE000496 


Escherichia coliK12 MG1655 section 386 
of 400 of the complete genome 


1.56E-04 


320 


3562.A15.GZ43 507839 


AF068289 


Homo sapiens HDCMD34P mRNA, 
complete cds 


6.60E-11 
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321 


3562.B22.GZ43 507952 


AK014534 


Mus musculus 0 day neonate skin cDNA, 
RIKEN full-length enriched library, 
clone:4631424J17, full insert sequence 


1.10E-07 




3562 C23.GZ43 507969 


L01787 


Ascaris suumphosphoenolpyruvate 


3.70E-07 


323 


3562.D10.GZ43 507762 


X54061 


D. melanogaster mRNA coding for a 205K 
microtubule-associated protein (MAP) 


6.60E-07 


324 


3562.E01.GZ43 507619 


M29812 


Homo sapiens Ig H-chain V71-4 (IGH@) 
gene, partial cds 


1.50E-05 




3562.E03.GZ43 507651 


X03729 


Vaccinia virus late gene cluster from central 
portion of genome containing the L65 gene 


2.63E-04 


326 


3562.E12.GZ43 507795 


M29694 


B.licheniformis RNApolymerase sigma-30 
factor (spoOH) gene, complete cds 


1.60E-05 




3562.F19.GZ43 507908 


AE006216 


PasteurellamultocidaPM70 section 183 of 
204 of the complete genome 


2.40E-05 


328 


3562.F20.GZ43 507924 


M91004 


Rabbit endothelial leukocyte adhesion 
molecule 1 (ELAM1), complete cds 


2.00E-06 


329 


3562.G13.GZ43 507813 


X69818 


E.muelleri COLF1 gene for extracellular 

matrix protein 


1.00E-06 




3562 G19 GZ43 507909 


AK019034 


Mus musculus 10 day old male pancreas 
cDNA, RIKEN full-length enriched library, 
clone:1810049K24, full insert sequence 


1.00E-05 


331 


3562.H11.GZ43 507782 


AF206598 


Algyroides fitzingeri 12S ribosomal RNA 
gene, partial sequence; tRNA-Val gene, 
complete sequence; and 16S ribosomal RNA 
gene, partial sequence; mitochondrial genes 
for mitochondrial products 


1.40E-07 


332 


3562.H12.GZ43 507798 


AK002041 


Homo sapiens cDNAFLJ11179 fis, clone 
PLACE1007450 


5.30E-07 


333 


3562.101. GZ43 507623 


S3 9048 


knob associated histidins-rich protein 
KAHRP {5'region} [Plasmodium 
falciparum, Genomic, 2215 nt] 


2.00E-06 


334 


3562.I02.GZ43 507639 


AF129501 


Buchnera aphidicola natural-host Diuraphis 
noxia acetohydroxy acid synthase large 
subunit (ilvl) and acetohydroxy acid 
synthase small subunit (ilvH) genes, 
complete cds; and unknown genes 


1.60E-07 


335 


3562.I13.GZ43 507815 


M26049 


Yeast (S.cerevisiae) RAD9 protein (required 
for cell cycle arrest during DNA repair) 
gene, complete cds 


4.00E-06 


336 


3562.I15.GZ43 507847 


AF3 10880 


Barbamlabarbatula microsatellite Bbar5 
sequence 


1.60E-07 
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337 


3562.J09.GZ43 507752 


AF236642 


Calothrixparietina clone 102-2A 16S-23S 
internal transcribed spacer, complete 
sequence; and tRNA-He and tRNA-Ala 
genes, complete sequence 


3.30E-07 


338 


3562J13.GZ43 507816 


ACO 10728 


Homo sapiens BAC clone RP1 1-258E22 


1.30E-05 


339 


3562.K04.GZ43 507673 


S79777 


{specific DNA probe for Plasmodium vivax 
pARC 1153} [Plasmodium vivax, 


5.40E-07 


340 


3562.K08.GZ43 507737 


AJ403240 


M.musculus DNAforvimentin-binding 


2.00E-06 


341 


3562.L12.GZ43 507802 


AE007840 


Clostridium acetobutylicum ATCC824 
section 328 of 356 of the complete genome 


5.80E-07 


342 


3562.N24.GZ43 507996 


AF255609 


Homo sapiens high mobility group protein 
HMG1 gene, exons 1 and 2, partial cds 


2.00E-07 


343 


3562.01 1.GZ43 507789 


M15027 


Human myelin pro teolipid protein gene, 
exon2 


1.00E-06 


344 


3562.018.GZ43 507901 


AL050208 


Homo sapiens mRNA; cDNA 
DKFZp586F2323 (from clone 
DKFZp586F2323) 


2.40E-07 


345 


3562.O20.GZ43 507933 


AY020756 


Oryza sativa microsatellite MRG3081 
containing (TA)X13, genomic sequence 


4.90E-08 


346 


3562.P21.GZ43 507950 


AF036318 


Skeletonema costatum cyclin (CYCL) gene, 
partial cds 


7.20E-07 


347 


3562.P23.GZ43 507982 


AF126719 


Plasmodium falciparum cAMP-dependent 
protein kinase (pka) gene, complete cds 


3.00E-06 


348 


3565.A23.GZ43 508351 


AL122065 


Homo sapiens mRNA; cDNA 
DKFZp434N011 (from clone 
DKFZp434N011) 


1.50E-07 


349 


3565.B05.GZ43 508064 


AF163325 


Trichoderma harzianum mitochondrial 
plasmid pHurl, complete plasmid sequence 


1.50E-07 


350 


3565.B13.GZ43 508192 


X62689 


T.retusa DNA for brachiopod cubitus- 
interruptus dominant (ciD) homologue 


9.00E-06 


351 


3565.B14.GZ43 508208 


M29929 


Human insulin receptor (allele 1) gene, 
exons 14, 15, 16 and 17 


4.30E-12 


352 


3565.C04.GZ43 508049 


AE006183 


Pasteurella multocida PM70 section 150 of 
204 of the complete genome 


2.00E-06 


353 


3565.C06.GZ43 508081 


S79836 


SCPx/SCP2=sterol carrier protein x/sterol 
carrier protein 2 {promoter} [human, 
Genomic, 3575 nt] 


3.00E-06 


354 


3565.C17.GZ43 508257 


L13937 


Bovine phospholipase C mRNA, complete 
cds 


3.00E-07 


355 


3565.D14.GZ43 508210 


M37818 


Human keratin (psi-K-alpha) pseudogene, 
exons 4,5,6,7 and 8, and keratin (psi-K- 
beta) pseudogene, complete cds 


3.50E-08 


356 


3565.D17.GZ43 508258 


Z19005 


C.pasteurianum gene for ferredoxin 


1.00E-06 
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357 


3565.D19.GZ43 50829C 


AE007758 


aostridium acetobufylicum ATCC824 
section 246 of 356 of tile complete genome 


3.00E-06 


358 


3565.E16.GZ43 508243 


L42813 


Pnrtoptems dolloi complete mitochondrial 
genome 


2.49E-04 


359 


3565.G07.GZ43 508101 


U97500 


Homo sapiens butyrophilin (BT3.3) gene, 
exons 1-4 


1.30E-05 


360 


3565.G09.GZ43 508133 


M95098 


Bos taurus lysozyme gene (cow 2), complete 
cds 


1.26E-04 


361 


3565.G22.GZ43 508341 


AJ400873 


Homo sapiens partial GPLD1 gene for 
glycosylphosphatidylinositolphospholipase 
D, exons 15-20 


1.40E-09 


362 


3565.H06.GZ43 508086 


U67465 


Methanococcus jannaschii section 7 of 150 
of the complete genome 


6.10E-07 


363 


3565.H10.GZ43 508150 


M15350 


Bacillus sp. strain 170 beta-lactamase gene, 
complete cds 


J. /Uli-Uo 


364 


3565.H11.GZ43 508166 


AB044878 


Equus caballus DNA, nticrosatellite 
TKY378 


3.20E-09 


365 


3565.H15.GZ43 508230 


AL 122122 


Homo sapiens mRNA; cDNA 
DKFZp434L098 (from clone 
DKFZp434L098) 


5.00E-06 


366 


3565.H23.GZ43 508358 


J05492 


E.coli cytochrome O ubiquinol oxidase 
(cyoA, cyoB, cyoC, cyoD and cyoE genes, 
complete cds 


1.00E-06 


367 


3565.H24.GZ43 508374 


AE001417 


Plasmodium falciparum chromosome 2, 
section 54 of 73 of the complete sequence 


1 70E-10 


368 


3565.K15.GZ43 508233 


AB062985 


Macaca fascicularis brain cDNA 
clone:QmoA- 10670 full insert sequence 


6 90E-105 


369 


3565.L22.GZ43 508346 


L81801 


Homo sapiens (subclone l_a2 from PI H31) 
DNA sequence, complete sequence 


1.30E-05 


370 


3565.M15.GZ43_50823 
5 


X08038 


Methanobacterium thermoautotrophicum 
rpoT, rpoU, rpo V and rpoX genes for RNA 
polymerase subunits A, B', B" and C 


1.10E-05 


371 


3565.M20.GZ43_50831 
5 


Z93381 


Caenorhabditis elegans cosmid F28G4 
complete sequence 


1.20E-05 


372 


3565.N12.GZ43 508188 


M21573 


Salmon (S.salar) growth hormone gene 
complete cds 


5.70E-05 


373 


3565.N13.GZ43 508204 


AK001163 


Homo sapiens cDNAFLJ10301 fis, clone 
NT2RM2000032 




374 


3565.N19.GZ43 508300 


AF321321 


Homo sapiens dopamine transporter 
(SLC6A3) gene, exon 15 and complete cds 


2.00E-06 


375 


3565.O02.GZ43 508029 


X59773 


Pisum sativum mRNA for P protein, a part 
of glycine cleavage complex 


1.30E-05 


376 


3565.O03.GZ43 508045 


Z27113 


H. Sapiens gene for RNA polymerase II 
subunit 14.4 kD 


2.00E-15 


377 


3565.O07.GZ43 508109 


X96607 


VLmusculus IgH 3' alpha enhancer DNA 


6.40E-05 


378 


3565.015.GZ43 508237 


Z35484 


rhermoanaerobacter sp. ATCC53627 cgtA 
gene 


3.00E-06 1 
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379 


3565.P03.GZ43 508046 


U11292 


Human Ki nuclear autoantigen mRNA, 
complete cds 


6 40E-07 


380 


3565.P09.GZ43 508142 


X56261 


Yeast PPH1 gene for protein phosphatase 
2A 


1.00E-06 


381 


3565.P22.GZ43 508350 


AE007790 


Closaidium acetobutylicurn ATCC824 
section 278 of 356 of the complete genome 


3.00E-06 


382 


3565.P24.GZ43 508382 


X61146 


N.tabacum NTP303 pollen specific mRNA 


2.70E-05 


383 


3568.A10.GZ43 508545 


U46925 


Arabidopsis thaliana GTP-binding protein 
ATGB2 mRNA complete cds 


3.00E-06 


384 


3568.B02.GZ43 508418 


U83640 


Mus caroli SplOO gene, exons 3 and 4 


1.90E-08 


385 


3568.B05.GZ43 508466 


BC008293 


Homo sapiens, Similar to jRJKEN cDNA 
A430101B06 gene, clone MGC:13017 
IMAGE-.3537789, mRNA, complete cds 


3.20E-16 


386 


3568.C22.GZ43 508739 


AF280797 


Homo sapiens NPC-related protein NAG73 
mRNA, complete cds 


1.00E-06 


387 


3568.D23.GZ43 508756 


AK022922 


Homo sapiens cDNAFLJ12860 fis, clone 
NT2RP2003559 


8.00E-06 


388 


3568.E17.GZ43 508661 


AF068294 


Homo sapiens HDCMB45P mRNA, partial 
cds 


5.30E-09 


389 


3568.E20.GZ43 508709 


AE006417 


Lactococcus lactis subsp. lactis IL1403 
section 179 of 218 of the complete genome 


1.10E-05 


390 


3568.F06.GZ43 508486 


U52198 


Vibrio anguillarumflageUinE (flaE), 
flagellin D (flaD), and fl'agellin B (flaB) 
genes, complete cds, and (flaG) gene, 
partial cds 


2.20E-05 


391 


3568.F07.GZ43 508502 


Z23599 


H. sapiens (D13S263) DNA segment 
containing (CA) repeat; clone 
AFM210ygll; single read 


1.90E-08 


392 


3568.F11.GZ43 508566 


AE007525 


Clostridium acetobutylicurn ATCC824 
section 13 of 356 of the complete genome 


4.20E-07 


393 


3568.F12.GZ43 508582 


D50416 


Mouse mRNA for AREC3, complete cds 


1.90E-05 


394 


3568.F22.GZ43 508742 


AF025900 


Histrionicus histrionicus CA dinucleotide 
repeat locus Hhimicro 1 


7 80E-07 


395 


3568.G10.GZ43 508551 


U66074 


Tritrichomonas foetus putative superoxide 
dismutase 2 (SOD2) gene, complete cds 


9 70E-07 


396 


3568.G12.GZ43 508583 


AB062941 


Macaca fascicularis brain cDNA clone:QflA 
14927, full insert sequence 


9.50E-47 


397 


3568.G24.GZ43 508775 


L27221 


Giardia mtestinalis pyruvate:flavodoxin 
oxidoreductase and flanking genes 


3.20E-05 


398 


3568.H20.GZ43 508712 


X75887 


B.tauras Brevican mRNA 


4.70E-05 


399 


3568.J10.GZ43 508554 


AF194829 


Tetragonia tetragonioides NADH 
dehydrogenase (ndhF) gene, partial cds; 
chloroplast gene for chloroplast product 


2.00E-06 


400 


3568.J22.GZ43 508746 


Y11031 


C.coh pldAgene 


1.00E-06 


401 


3568.K01.GZ43 508411 


AL137751 


Homo sapiens mRNA; cDNA 
DKFZp434I0812 (from clone 
DKFZp434I0812); partial cds 


3.00E-06 | 
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402 


3568.K04.GZ43 50845S 


Z82295 


R.prowazekii genomic DNA fragment 
(clone A153F) 


7.20E-08 


403 


3568.L04.GZ43 508460 


AL050105 


Homo sapiens mRNA; cDNA 
DKFZp586H0519 (from clone 
DKFZp586H0519)- partial cds 


1 AAT7 r\< 


404 


3568.M03.GZ43_50844 
5 


L76259 


Homo sapiens PTS gene, complete cds 


8.00E-06 


405 


3568.M13.GZ43_50860 
5 


X61218 


M.musculus cervicolor (strain CRP) Tcp-1 
gene fort-complex polypeptide 1, exons 8- 
10 


3.10E-09 


406 


3568.N11.GZ43 508574 


AL079296 


Homo sapiens mRNA full length insert 
cDNA clone EUROIMAGE 609395 


2.00E-06 


407 


3568.017.GZ43 508671 


AF078848 


Homo sapiens BUP mRNA, complete cds 


9 50E-09 


408 


3568.P04.GZ43 508464 


AB041548 


Mus musculus brain cDNA, clone MNCb- 
3816, similar to AF171875 gl-related zinc 
finger protein (Mus musculus) 


5 00E-06 


409 


3568.P18.GZ43 508688 


AL358951 


Human DNA sequence from clone RP3- 
456L16 on chromosome 6, complete 
sequence [Homo sapiens] 


3 00E-07 


410 


3568.P19.GZ43 508704 


U43542 


Nicotiana tabacum diphenol oxidase 
inRNA, complete cds 


2 00E-06 


411 


3571.A04.GZ43 508833 


AF017116 


Homo sapiens type-2 phosphatide acid 
phosphohydrolase (PAP2) mRNA, complete 


2.40E-07 


412 


3571.A07.GZ43 508881 


L81867 


Homo sapiens (subclone l_a8 from PI H54) 
DNA sequence, complete sequence 


9.00E-06 


413 


3571.A08.GZ43 508897 


X85041 


H. sapiens PE5L gene ALU repeat region 


2.00E-06 


414 


3571.A11.GZ43 508945 


U19361 


Petromyzon mannus neurofilament subunit 
NF-180 mRNA, complete cds 


4.70E-08 


415 


3571.A14.GZ43 508993 


AL022342 


Human DNA sequence from clone RP1- 
29M10 on chromosome 20, complete 
sequence [Homo sapiensl 


7.00E-05 


416 


3571.A22.GZ43 509121 


U09448 


Vaucheria bursata protein syntliesis 
elongation factor Tu (tufA) gene, 
chloroplast gene encoding chloroplast 
protein, partial cds 


7 20E-07 


417 


3571.B13.GZ43 508978 


AE002555 


Neisseria meningitidis serogroup B strain 
V1C58 section 197 of 206 of the complete 
genome 


4.40E-05 


418 


3571.B22.GZ43 509122 


AE002555 


Neisseria meningitidis serogroup B strain 
MC58 section 197 of 206 of the complete 
genome 


4.50E-05 


419 


3571.C08.GZ43 508899 


AJ010154 


Saguinus oedipus msp-El gene 


1.10E-17 


420 


3571.D04.GZ43 508836 


AF 125460 


Caenorhabditis eleeans cosmid Y9D1A 


3.60E-07 


421 


3571.D07.GZ43 508884 


U51654 


Barbus barbus x Barbus meridionalis 
tnicrosatellite clone no.37 


8.72E-02 


422 


3571.E02.GZ43 508805 


AF329081 


Bos taurus AMP-activated protein Idnase 
gamma-l (PRKAG1) gene, partial cds 


4.40E-33 


423 


3571.E10.GZ43 508933 


M96068 


Madagascarperiwinkle 
aydroxymethylglutaryl-CoA reductase 
(HMGR) mRNA, complete cds 


3.30E-08 
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424 


3571.E16.GZ43 509029 


AE006429 


Lactococcus lactis subsp. lactis IL1403 
section 191 of 218 of the complete genome 


1.30E-05 


425 


3571.F06.GZ43 508870 


AL137296- 


Homo sapiens mRNA; cDNA 
DKFZp434M0416 (from clone 
DKFZp434M0416) 


4.40E-07 


426 


3571.F16.GZ43 509030 


M58478 


Human cystic fibrosis transmembrane 
conductance regulator gene, 5' end 


6.30E-05 


427 


3571.F23.GZ43 509142 


AF038397 


Mus museums glutaminase (Gls) gene, 
partial 3' sequence 


4.70E-08 


428 


3571.G22.GZ43 509127 


M80596 


Saccharomyces cerevisiae VAC1 gene 
(required for vacuole inheritance and 
vacuole protein sorting), complete cds 


7.00E-06 


429 


3571.G24.GZ43 509159 


Z75330 


H.sapiens mRNA for nuclear protein SA-1 


1.00E-46 


430 


3571.H01.GZ43 508792 


U71144 


Influenza A virus H3N2 A/Akita/1/94 
nucleoprotein (NP) gene, complete cds 


1.90E-05 


431 


3571.H10.GZ43 508936 


AF038564 


Homo sapiens atrophin-1 interacting protein 
4 (AIP4) mRNA, partial cds 


6.60E-53 


432 


3571.H12.GZ43 508968 


K00131 


mouse b2 repeat sequence from clone mm61 


3.00E-08 


433 


3571.H16.GZ43 509032 


AF179564 


Homo sapiens GTF2I-like sequence within 
duplicated segment of Williams syndrome 
region 


1.20E-23 


434 


3571.H18.GZ43 509064 


AE000331 


Escherichia coliK12 MG1655 section 221 
of 400 of the complete genome 


1.45E-04 


435 


3571.11 1.GZ43 508953 


U20661 


Dictyostelium discoideum unknown internal 
repeat protein gene, complete cds, and 
unknown orfl, orf2 and orf3 genes, partial 
cds 


9.00E-06 


436 


3571.J07.GZ43 508890 


M58478 


Human cystic fibrosis transmembrane 
conductance regulator gene, 5' end 


6.40E-05 


437 


3571.J08.GZ43 508906 


AK021312 


Mus museums 13 days embryo stomach 
cDNA, RIKEN full-length enriched library, 
clone:D530039A21, full insert sequence 


3.60E-08 


438 


3571.J09.GZ43 508922 


X66483 


D.discoideumgp80 gene 


8.90E-07 


439 


3571J14.GZ43 509002 


L77119 


Methanococcus jannaschii small extra- 
chromosomal element, complete sequencer 


1.40E-05 


440 


3571.L01.GZ43 508796 


AK005500 


Mus museums adult female placenta cDNA, 
RKEN full-length enriched library, 
clone:1600019O04, full insert sequence 


6.00E-06 


441 


3571.M17.GZ43_50905 
3 


AF085681 


Mus musculus tubby like protein 1 (Tulpl) 
mRNA, complete cds 


5.00E-06 


442 


3571.M19.GZ43_50908 
5 


D 10487 


B.thermoglucosidasius gene for oligo-1,6- 
glucosidase 


9.00E-06 


443 


3571.M24.GZ43_50916 
5 


M97680 ' 


Blnetongue virus type 2 genomic RNA 
sequence 


2.00E-06 


444 


3571.N09.GZ43 508926 


X86100 


R.norvegicus BSP gene 


3.40E-07 
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445 


3571.N14.GZ43 509006 


D32007 


Mouse mRNA for aliomlogueof human 
CBFA2Tl(Mtg8a), complete cds 


1.20E-08 


446 


3571.N17.GZ43 509054 


Z68755 


Human DNA sequence from cosmid 
LI 18D5, Huntington's Disease Region, 
chromosome 4pl6.3 


1.70E-10 


447 


3571.N22.GZ43 509134 


D00326 


Porcine rotavirus (strain Gottfried), VP6 
gene, complete cds 


1.00E-06 


448 


3571.O08.GZ43 508911 


X66483 


D.discoideum gp80 gene 


8.20E-07 


449 


3574.A20.GZ43 509473 


AJ271814 


Drosophila melanogaster mRNA for 
mesol8E protein 


1.70E-07 


450 


3574.B01.GZ43 509170 


U93261 


Homo sapiens DESP4P1 pseudogene 
sequence 


1.00E-06 


451 


3574.B04.GZ43 509218 


Y08207 


C.elaphus mitochondrial tRNA-Thr, tRNA- 
Pro and tRNA-Phe genes 




452 


3574.B10.GZ43 509314 


AL161991 


Homo sapiens mRNA; cDNA 
DKFZp761C169 (from clone 
DKFZp761C169); partial cds 


3.00E-06 


453 


3574.B14.GZ43 509378 


D79208 


Apis mellifera mRNA for alpha- 
glucosidase, complete cds 


7.00E-06 


454 


3574.B24.GZ43 509538 


AE007758 


Qostridium acetobutyliciim ATCC824 
section 246 of 356 of the complete genome 


3.00E-06 


455 


3574.C09.GZ43 509299 


AF057708 


Populus balsamifera subsp. trichocarpa PTD 
protein (PTD) gene, complete cds 




456 


3574.C10.GZ43 509315 


AE005602 


Escherichia coli 0157:H7 EDL933 genome, 
contig3 of 3, section 221 of 290 


9.70E-05 


457 


3574.C12.GZ43 509347 


AJ223633 


Entero coccus faecium genes encoding 
enterocin L50A and enterocinL50B plus 5' 
and 3" flanking regions 


9.50E-07 


458 


3574.C14.GZ43 509379 


X99710 


L.lactis ORF, genes homologous to vsf-1 
and pepF2 and gene encoding protein 
homologous to methyltransferase 


4.00E-06 


459 


3574.C16.GZ43 509411 


AF092920 


Chlorohydra vhidissima head-activator 
binding protein precursor (HAB) mRNA, 
complete cds 


3.00E-07 


460 


3574.C23.GZ43 509523 


AB047856 


Oryza sativa Ub-CEP52-2 gene for ubiquitin 
fused to ribosomal protein L40, complete 


5.00E-08 


461 


3574.D02.GZ43 509188 


AB060225 


Macaca fascicularis brain cDNA clone:QflA 
14955, full insert sequence 


5.70E-07 


462 


3574.D12.GZ43 509348 


M58478 


Human cystic fibrosis transmembrane 
conductance regulator gene, 5' end 


6.00E-05 


463 


3574.E02.GZ43 509189 


L37347 


Human integral membrane protein 
(Nramp2) mRNA, partial 


2.00E-06 


464 


3574.E03.GZ43 509205 


X05817 


Bovine papillomavirus type 4 (BPV-4) 
genome 


6.00E-06 


465 


3574.E14.GZ43 509381 


U67507 


Metftanococcus jannaschii section 49 of 150 
of the complete genome 


3.40E-05 


466 


3574.F10.GZ43 509318 


M24376 


Mouse zinc finger protein (krox-20) gene, 
exon 1 


3.80E-08 
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467 


3574.F18.GZ43 509446 


AF184170 


Sparus aurata elongation factor 1 -alpha 
(EF1 -alpha) mRNA, complete cds 


3.40E-07 


468 


3574.F23.GZ43 509526 


Z29486 


Rjiorvegicus (Sprague Dawley) mRNA for 
AMP-activated protein kinase 


9.00E-06 


469 


3574.G07.GZ43 509271 


AF064079 


Plasmodium gallinaceum endochitinase 
precursor, mRNA, complete cds 


1.40E-07 


470 


3574.G11.GZ43 509335 


AF032872 


Rattus norvegicus potassium channel 
regulatory protein KChAP mRNA, complete 
cds 


7.40E-07 


471 


3574.H07.GZ43 509272 


J04718 


Human proliferating cell nuclear antigen 
(PCNA) gene, complete cds 


3.10E-07 


472 


3574.I02.GZ43 509193 


AF200361 


Rattus norvegicus cytochrome P450 4F1 
(Cyp4Fl) gene, complete cds 


1.50E-05 


473 


3574.I07.GZ43 509273 


M29688 


S.cerevisiaePMSl gene encoding DNA 
mismatch repair protein, complete cds 


1.20E-08 


474 


3574J11.GZ43 509338 


Z24104 


H. sapiens (D12S338) DNA segment 
containing (CA) repeat; clone AFM291wd9; 
single read 


3.20E-07 


475 


3574J14.GZ43 509386 


AB008430 


Homo sapiens mRNA for CDEP, complete 
cds 


4.70E-05 


476 


3574J23.GZ43 509530 


AP000384 


Arabidopsis thaliana genomic DNA, 
chromosome 3, PI clone:MCE21 


7.10E-07 


477 


3574.K12.GZ43 509355 


AB031814 


Mus museums oatp2 mRNA for organic 
anion transporting polypeptide 2, complete 
cds 


1.50E-05 


478 


3574.K20.GZ43 509483 


AF126719 


Plasmodium falciparum cAMP-dependent 
protein kinase (pka) gene, complete cds 


3.00E-06 


479 


3574.L07.GZ43 509276 


U53400 


Rattus norvegicus chromosome 10 
microsatelhte sequence D10Mco21 


8.94E-02 


480 


3574.M03.GZ43_50921 

3 


AB000404 


Rice grassy stunt virus genomic RNA6 for 
20.6K major nonstructural protein and 
3 6. 4K protein, complete cds 


5.60E-07 


481 


3574.M23.GZ43_50953 
3 


U18056 


Lycopersicon esculentum 1-amino- 
cyclopropane-l-cafboxylate synthase (LE- 
ACS1A) gene, complete cds 


3.40E-07 


482 


3574.N04.GZ43 509230 


L48479 


Homo sapiens (subclone 6_hl from PI H21) 
DNA sequence 


3.30E-09 


483 


3574.N10.GZ43 509326 


M58150 


Bovine lactoperoxidase (LPO) mRNA, 
complete cds 


3.60E-05 


484 


3574.N12.GZ43 509358 


AF182950 


Homo sapiens HEX (HEX) gene, partial cds 
and 5' flanking sequence 


9.00E-06 


485 


3574.N20.GZ43 509486 


AE006904 


Snlfolobus solfataricus section 263 of 272 of 
the complete genome 


3.00E-06 


486 


3574.P07.GZ43 509280 


U60232 


Homo sapiens cysteine dioxygenase (CDO- 
1) gene, 5' flanking region and exons 1 and 
2 


6.30E-08 


487 


3574.P17.GZ43 509440 


AC002218 


Homo sapiens (subclone 2_cl from PI H43) 
DNA sequence, complete sequence 


5.30E-08 


488 


3577.A06.GZ43 509633 


U28328 


Bos taurus dmucleotide repeat RM154, 
tandem repeat region 


3.40E-27 
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489 


3577.A18.GZ43 509825 


X58774 


Herpesvirus saimiri sRNAl, sRNA2, 
sRNA3 and sRNA4 genes for small viral 
RNAs 


1.00E-06 


490 


3577.B12.GZ43 509730 


BC008400 


Homo sapiens, postmeiotic segregation 
increased (S. cerevisiae) 2, clone 
IMAGE:4273792, mRNA 


2.50E-05 


491 


3577.B15.GZ43 509778 


M61127 


Drosophila melanogaster GTP-binding 
protein (arf-like) gene, complete cds 


1.10E-05 


492 


3577.B19.GZ43 509842 


AF135526 


Homo sapiens clone MTNT26 colon cancer 
differentially methylated CpG island 
genomic sequence 


1.00E-06 


493 


3577.E19.GZ43 509845 


AF063864 


Schizosaccharomyces pombe essential 
nuclear protein Mcm3p (mcm3+) gene, 
complete cds 


1.00E-06 


494 


3577.F02.GZ43 509574 


U37434 


Danio rerio L-isoaspartate (D-aspartate) O- 
methyltransf erase (PCMT) mRNA, 
complete cds 


5.10E-08 


495 


3577.G07.GZ43 509655 


AF001893 


Human MBN1 region clone epsilon/beta 
mRNA 3' fragment 


3.00E-06 


496 


3577.G13.GZ43 509751 


M83821 


Xenopus laevis mucin B.l consensus repeat 
mRNA 


2.10E-07 


497 


3577.H06.GZ43 509640 


AK007565 


Mus museums 10 day old male pancreas 
cDNA, R1KEN full-length enriched library, 
clone:1810020K22, full insert sequence 


8.00E-07 


498 


3577.H08.GZ43 509672 


L81912 


Homo sapiens (subclone 2_g5 from P AC 
H74) DNA sequence, complete sequence 


2.40E-07 


499 


3577.H18.GZ43 509832 


AL157461 


Homo sapiens mRNA; cDNA 
DKFZp434K152 (from clone 
DKFZp434K152) 


4.00E-06 


500 


3577.I01.GZ43 509561 


U35006 


Carcharhinus plumbeus Ig lambda light 
chain gene, complete cds 


2.00E-06 


501 


3577.I17.GZ43 509817 


AF157252 


Gongronellabutleri translation elongation 
factor 1-alpha (EF-lalpha) gene, partial cds 


1.00E-06 


502 


3577.J04.GZ43 509610 


AF338249 


Sus scrofa fcyroM-stimulating hormone 
receptor mRNA, complete cds 


2.00E-06 


503 


3577.K06.GZ43 509643 


AB000264 


Bacillus firmus DNA for beta-amylase, 
partial cds 


5.00E-07 


504 


3577.K14.GZ43 509771 


X15441 


Aspergillus mdulans mitochondrial ndliC 
and oxiB genes for NADH dehydrogenase 
subunit 3 and cytochrome oxidase subunit II 


1.00E-06 


505 


3577.K23.GZ43 509915 


X52952 


Rat mRNA for c-mos 


3.00E-06 


506 


3577.L10.GZ43 509708 


X60578 


Hepatitis C genomic RNA for putative 
envelope protein (RE56 isolate) 


3.70E-07 


507 


3577.N10.GZ43 509710 


Z75121 


S.cerevisiae chromosome XV reading frame 
ORFYOR213c 


4.50E-09 


508 


3577.N14.GZ43 509774 


M90058 


Human serglycin gene, exons 1,2, and 3 


5.00E-06 


509 


3577.017.GZ43 509823 


L19141 


Lupinus albus L-asparaginase gene, 
complete cds 


9.10E-08 



111 



WO 2004/039943 



PCT/LS2003/015465 



Table 8 



SEQ 
ID 


SEQ NAME 


ACCESSION 


GEN BANK DESCRIPTION 


GENBANK 
SCORE 


510 


3577.022.GZ43 509903 


AL031008 


Human DNA sequence from clone 360A4 
on chromosome 16. Contains ESTs 
complete sequence [Homo sapiensl 


5.60E-08 


511 


3577.P02.GZ43 509584 


AK006176 


Mus museums adult male testis cDNA, 
RIKEN full-length enriched library, 
clone:1700020M10, full insert sequence 


4 60E-08 


512 


3577.P07.GZ43 509664 


U05822 


Human proto-oncogene BCL3 gene exon2 


2.40E-14 


513 


3577.P23.GZ43 509920 


AJO 10341 


Homo sapiens PISSLRE gene, exons 1, 2, 
and 3 and joined CDS 




514 


3580.A04.GZ43 509985 


AJ010213 


Mus museums beta-dystrobrevin gene, exon 
10 


8 20E-07 


515 


3580.A09.GZ43 510065 


AB037862 


Homo sapiens mRNA for KIAA1441 
protein, partial cds 


6.30E-15 


516 


3580.A13.GZ43 510129 


U17832 


Symploce pallens mitochondrion 16S 
ribosomal RNA, partial sequence 


7.80E-07 


517 


3580.A14.GZ43 510145 


X89414 


A.thaliana DNA for pyrroline-5 -carboxylase 
synthetase gene 


6.00E-06 


518 


3580.B01.GZ43 509938 


U67487 


Methano co ecus j annas cliii section 29 of 150 
of the complete genome 


9.00E-05 


519 


3580.C01.GZ43 509939 


XI 4898 


Hamster p7 preinsertion DNA 


2 00E-06 


520 


3580.C03.GZ43 509971 


X76302 


H. sapiens RY- 1 mRNA for putative nucleic 
acid binding protein 


3 70E-07 


521 


3580.C05.GZ43 510003 


Z22923 


M.musculus alpha2 (TX) collagen gene, 
complete CDS 


1.60E-05 


522 


3580.D07.GZ43 510036 


' AB062941 


Macaca fascicularis brain cDNA clone:QflA- 
14927, full insert sequence 


9.80E-22 


523 


3580.D22.GZ43 510276 


M84136 


Flaveria chloraefolia flavonol 4'- 
sulfotransferase mRNA, complete cds 


4.00E-06 


524 


3580.E02.GZ43 509957 


AE001002 


Archaeoglobus fulgidus section 105 of 172 




525 


3580.E08.GZ43 510053 


U48431 


Drosopliilapseudoobscura alpha-amylase 
(Amy3) pseudogene, complete cds 


3.00E-06 


526 


3580.E10.GZ43 510085 


Z64717 


H. sapiens CpG island DNA genomic Msel 
fragment, clone 161e9, forward read 
cpgl61e9.ftla 


9 60E-19 


527 


3580.E19.GZ43 510229 


M64984 


Candida tropicalis open reading frame DNA 
sequence 


2.00E-06 


528 


3580.E21.GZ43 510261 


M84136 


Flaveria chloraefolia flavonol 4'- 
sulfotransferase mRNA, complete cds 


5.00E-06 


529 


3580.E23.GZ43 510293 


AB033570 


Eptatretus burgeri hgPTPR5a mRNA, 
partial cds 


2.00E-06 


530 


3580.G03.GZ43 509975 


Y14277 


Drosophila melanogaster mRNA for nuclear 
protein SA 


1.10E-05 


531 


3580.G13.GZ43 510135 


AK018491 


Mus museums adult male colon cDNA, 
RIKEN full-length enriched library, 
clone: 903 040 8N04, full insert sequence 


4.40E-08 


532 


3580.G14.GZ43 510151 


AF142660- 


Larna glamarnicrosatelliteLCA90 sequence 


2.60E-07 
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533 


3580.G18.GZ43 510215 


D 86226 


Spinacia oleracea DNA for nitrate 
reductase, complete cds 


2.60E-05 


534 


3580.G19.GZ43 510231 


U60502 


Glycine max actin (Soyll9) gene, partial 
cds 


7.00E-06 


535 


3580.G20.GZ43 510247 


D38524 


Human mRNA for 5 '-nucleotidase 


4.80E-11 


536 


3580.G24.GZ43 510311 


AF084480 


Mus musculus Williams-Beuren syndrome 
deletion transcript 9 homolog (Wbscr9) 
mRNA complete cds 


5.00E-06 


537 


3580.H12.GZ43 510120 


X78423 


D.carota (Queen Anne's Lace) Inv*Dc3 
gene, 4444bp 


4.00E-06 


538 


3580.H16.GZ43 510184 


Y13786 


Homo sapiens mRNA for meltrin- 
beta/ADAM 19 homologue 


4.50E-10 


539 


3580.H22.GZ43 510280 


X62578 


C.caldariumplastid genes ompR', psbD, 
psbC, rpsl6 and groEL 


2.50E-05 


540 


3580.I06.GZ43 510025 


X51344 


Spiroplasma virus (SpVl-R8A2 B) 
complete genome 


4.70E-07 


541 


3580.I08.GZ43 510057 


X02761 


Human mRNA for fibronectin (FN 
precursor) 


1.02E-04 


542 


3580.I18.GZ43 510217 


BC007856 


Homo sapiens, clone MGC: 14337 
IMAGE:4298428, mRNA, complete cds 


2.60E-10 


543 


3580.J10.GZ43 510090 


AF068206 


Rangifer tarandus micro satellite NVHRT16 

sequence 


4.40E-11 


544 


3580.J12.GZ43 510122 


AE008323 


Agrobacterhmi tumefaciens strain C58 
linear chromosome, section 1 27 of 187 of 
the complete sequence 


9.30E-05 


545 


3580J18.GZ43 510218 


AF222689 


Homo sapiens protein arginine N- 
methyltransferase 1 (HRMT1L2) gene, 
complete cds, alternatively spliced 


1.50E-05 


546 


3580.J20.GZ43 510250 


M31651 


Homo sapiens sex hormone-binding 
globulin (SHBG) gene, complete cds 


3.80E-07 


547 


3580.J21.GZ43 510266 


AB054062 


Pagrus major lpl mRNA for lipoprotein 
lipase, complete cds 


3.00E-06 


548 


3580.K03.GZ43 509979 


AE007607 


Clostridium acetobutylicum ATCC824 
section 95 of 356 of the complete genome 


5.00E-05 


549 


3580.K05.GZ43 510011 


Z15027 


H. sapiens HLA class III DNA 


3.70E-08 


550 


3580.K21.GZ43 510267 


AF135826 


Mus musculus neuronal nitric oxide 
synthase (NOS-I) gene, exon lc and 5'- 
flanking sequence 


2.20E-09 


551 


3580.L09.GZ43 510076 


AL049333 


Homo sapiens mRNA; cDNA 
DKFZp564Ml 16 (from clone 
DKFZp564M116) 


3.40E-13 


552 


3580.L10.GZ43 510092 


AF278587 


Borrelia burgdorferi strain BC-1 outer 
surface protein C (ospC) gene, partial cds 


2.00E-06 


553 


3580.L12.GZ43 510124 


D14664 


Human mRNA for KIAA0022 gene, 
complete cds 


1.10E-05 


554 


3 580X13. GZ43 510140 


K02269 


Human ERV3 (endogenous retrovirus 3) 
gag gene 


3.30E-07 


555 


3580.L17.GZ43 510204 


U60232 


Homo sapiens cysteine dioxygenase (CDO- 
1) gene, 5' flanking region and exons 1 and 
2 


2.00E-07 
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556 


3580.M01.GZ43 50994 
9 


U53400 


Rattus norvegicus chromosome 10 
microsatellite sequence D10Mco21 


4.54E-01 


557 


3580.M16.GZ43 51018 
9 


AE006406 


Lactococcus lactis subsp. lactis IL1403 
section 168 of 218 of the complete genome 


3.00E-06 


558 


3580.M17.GZ43_51020 
5 


AF348584 


Arabidopsis thaliana unknown protein 
(T8K14.7) mRNA, complete cds 


6.70E-07 


559 


3580 .M18.GZ43_5 1022 
1 


X69908 


H.sapiens gene for mitochondrial ATP 
synthase c subunit (P2 form^ 


1.00E-05 


560 


3580.M23.GZ43_51030 
1 


M17326 


Mouse endogenous murine leukemia virus 
polytropic provirus DNA, complete cds 


9.00E-06 


561 


3580.N10.GZ43 510094 


AF103970 


Lasioglossum rohweri cytochrome oxidase I 
(COT) gene, mitochondrial gene encoding 
mitochondrial protein, partial cds 


1.00E-06 


562 


3580.N11.GZ43 510110 


Z80362 


H.sapiens HLA-DRB pseudogene, exon 1; 


6.10E-11 


563 


3580.N14.GZ43 510158 


AB014462 


Xenopus laevis XNLRR-1 mRNA, complete 
cds 


1.60E-05 


564 


3580.N15.GZ43 510174 


AF164381 


Anomochloa marantoidea maturase (matK) 
gene, complete cds; chloroplast gene for 
chloroplast product 


1.00E-06 


565 


3580.N23.GZ43 510302 


AB047880 


Macaca fascicularis brain cDNA, 
clone:QnpA-14303 


2.00E-06 


566 


3580.O02.GZ43 509967 


X55948 


H. aspersa cytoplasmic intermediate 
filament gene exons 2 to 6 


4.00E-06 


567 


3580.O06.GZ43 510031 


L34649 


Homo sapiens platelet/endothelial cell 
adhesion molecule-1 (PECAM-l) gene, 
exon 14 


4.00E-06 


568 


3580.O07.GZ43 510047 


Z30183 


Hsapiens mig-5 gene 


3.00E-05 


569 


3580.O08.GZ43 510063 


AF101385 


Homo sapiens ribosomal protein LI 1 gene, 
complete cds 


1.80E-08 


570 


3580.P04.GZ43 510000 


AC016707 


Homo sapiens BAC clone RP11-221K4 
fromY, complete sequence 


1.80E-08 


571 


3580.P05.GZ43 510016 


AF055482 


Thermotoga neapolitana galactose 
utilization operon, complete sequence 


8.00E-07 


572 


3580.P14.GZ43 510160 


AF009133 


Rattus norvegicus CD94 (Cd94) mRNA, 
complete cds 


7.50E-08 


573 


3580.P19.GZ43 510240 


Y15176 


Human papillomavirus type 80 E6, E7, El, 
E2, E4, L2, and LI genes 


7.00E-06 


574 


3583.B06.GZ43 510402 


X51398 


Chlamydomonas moewusii chloroplast 
DNA for ORF 563 and transfer RNA-Thr 


3.00E-06 


575 


3583.B07.GZ43 510418 


U39382 


Hexachaetaamabilis 16S ribosomal RNA 
gene, mitochondrial gene encoding 
mitochondrial RNA, partial sequence 


5.50E-08 


576 


3583.B10.GZ43 510466 


S45332 


erythropoietin receptor [human, placental, 
Genomic, 8647 nt] 


3.90E-10 


577 


3583.B11.GZ43 510482 


AC006623- 


Caenorhabditis elegans clone C52E2, 
complete sequence 


4.00E-06 
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578 


3583.D15.GZ43 510548 


AF242297 


Homo sapiens phosducin-like protein gene, 
promoter and exon 1 


3.80E-08 


579 


3583.D22.GZ43 510660 


Z23548 


H. sapiens (D10S540) DNA segment 
containing (CA) repeat; clone 
AFM205xell; single read 


3.20E-07 


580 


3583.E11.GZ43 510485 


X69737 


E.esula chloroplast rbcL gene for ribulose- 
1,5-biphosphate-carboxylase and promoter 
region 


1.30E-08 


581 


3583.E13.GZ43 510517 


AB007856 


Homo sapiens KIAA0396 mRNA, partial 
cds 


2.20E-05 


582 


3583.E15.GZ43 510549 


X74131 


H.nelsoni small subunitribosomal RNA 


7.00E-06 


583 


3583.E17.GZ43 510581 


AE006633 


Streptococcus pyogenes Ml GAS strain 
SF370, section 162 of 167 of the complete 
genome 


2.40E-07 


584 


3583.F24.GZ43 510694 


J02846 


Human tissue factor gene,' complete cds 


7.40E-07 


585 


3583.G09.GZ43 510455 


X88789 


P.sativum mRNA for starch synthase (2035 
bp) 


2.10E-05 


586 


3583.G16.GZ43 510567 


AK000735 


Homo sapiens cDNAFLJ20728 fis, clone 
HEP11763 


4.70E-07 


587 


3583.G17.GZ43 510583 


AK026822 


Homo sapiens cDNA: FLJ23169 fis, clone 
LNG09957 


2.60E-05 


588 


3583.G21.GZ43 510647 


U13044 


Human nuclear respiratory factor-2 submit 
alpha mRNA, complete cds 


2.00E-06 


589 


3583.H03.GZ43 510360 


M26222 


African green monkey origin of replication 
(ORS9) region 


1.00E-13 


590 


3583.H12.GZ43 510504 


X01669 


Human c-k-ras oncogene exon 2 from lung 
carcinoma pr3 10 


3.20E-08 


591 


3583.H13.GZ43 510520 


AK022380 


Homo sapiens cDNAFLJ12318 fis, clone 
MAMMA1002068 


2.00E-06 


592 


3583.H15.GZ43 510552 


L77119 


Methanococcus jannaschii small extra- 
chromosomal element, complete sequencer 


1.60E-05 


593 


3583.J02.GZ43 510346 


AJ007302 


Sus scrofa triadin gene 


1.00E-06 


594 


3583.K08.GZ43 510443 


D63902 


Mouse mRNA for estrogen-responsive 
finger protein, complete cds 


2.50E-11 


595 


3583.K10.GZ43 510475 


U11816 


Lactobacillus strain 30A ornithine 
decarboxylase (odd) gene, complete cds 


1.00E-05 


596 


3583.K11.GZ43 510491 


X73416 


W.suaveolens mitochondrial orf 1 


6.00E-06 


597 


3583.K14.GZ43 510539 


U04367 


Bacillus thuringiensis dakotaHD511 Crylll 
deita-endotoxin gene, partial cds 


1.20E-05 


598 


3583.K17.GZ43 510587 


AE004129 


Vibrio cholerae chromosome I, section 37 of 
251 of the complete chromosome 


8.00E-06 


599 


3583.K23.GZ43 510683 


AE001410 


Plasmodium falciparum chromosome 2, 
section 47 of 73 of the complete sequence 


4.00E-06 


600 


3583.L05.GZ43 510396 


X55299 


C.stercorarium celZ gene for endo-beta-1,4- 
glucanase (Avicelase I) 


1.00E-05 


601 


3583.L08.GZ43 510444 


AF106953 


Homo sapiens SOS1 (SOS1) gene, partial 
cds 


7.50E-09 


602 


3583.L09.GZ43 510460 


L34842 


Soybean chloroplast phytochrome A (phyA) 
gene, complete cds 


2.40E-05 
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603 


3583.L17.GZ43 510588 


X65223 


T.rubntm mitochondrion genes for 
cytochrome oxidase I, cytochrome oxidase 
II, ATPase 9, NADH dehydrogenase subunit 
4L, NADH dehydrogenase subunit 5, tRNA 
Gin, tRNA-Met and tRNA-Arg 


5.00E-06 


604 


3583.L21.GZ43 510652 


AF106661 


Rattus norvegicus glutathione S-transferase 
Yb4 (GstYb4) gene, complete cds 


5.00E-06 


605 


3583.M08.GZ43_51044 
5 


BCO05276 


Homo sapiens, Similar to GR02 oncogene, 
clone MAGE:407 1652, mRNA 


3.70E-07 


606 


3583.M10.GZ43_51047 
7 


Y00477 


Human bone marrow serine protease gene 
(medullasin) (leukocyte neutrophil elastase 
gene) 


4.70E-09 


607 


3583.M13.GZ43_51052 
5 


X73030 


S.cerevisiaeYGPl gene 


7.00E-06 


608 


3583.N09.GZ43 510462 


AK018377 


Mus musculus 16 days embryo lung cDNA, 
RIKEN full-length enriched library, 
clone:8430403M08, full insert sequence 


4.60E-07 


609 


3583.O03.GZ43 510367 


X72698 


P.pygmaeus ZFY gene for Y-hnked Zinc 
finger protein, final intron 


3.00E-06 


610 


3583.011.GZ43 510495 


U40161 


Arabidopsis thaliana type 2A protein 
serine/threonine phosphatase 55 kDa B 
regulatory subunit mRNA, complete cds 


2.00E-06 


611 


3583.017.GZ43 510591 


U67567 


Melhanococcus jannaschii section 109 of 
150 of the complete genome 


2.00E-06 


612 


3583.P09.GZ43 510464 


AK021312 


Mus musculus 13 days embryo stomach 
cDNA, RIKEN full-length enriched library, 
clone:D530039A21, full insert sequence 


3.60E-08 


613 


3583.P19.GZ43 510624 


U12920 


Caenorhabditis elegans sex determination 
(tra-3) gene, exons 2-6 


1.60E-05 


614 


3583.P22.GZ43 510672 


AJ133800 


Homo sapiens CPNE7 gene (partial), exon 2 


7.60E-07 


615 


3590.A12.GZ43 512274 


AF185661 


Glomus mtraradices strain FL208 18fc> 
ribosomal RNA, partial sequence; internal 
transcribed spacer 1, 5.8S ribosomal RNA 
and internal transcribed spacer 2, complete 
sequence; 26S ribosomal RNA, partial 
sequence 


2.00E-06 


616 


3590.B01.GZ43 512099 


M96068 


Madagascarperiwinkle 
hydroxymemylglutaryl-CoA reductase 
(HMGR) mRNA, complete cds 


7.40E-09 


617 


3590.B16.GZ43 512339 


V01527 


Mouse gene coding for major 
histocompatibility antigen. This is a class II 
antigen, I-A-beta 


2.40E-12 


618 


3590.B21.GZ43 512419 


AB028983 


Homo sapiens mRNA for KIAA1060 
protein, partial cds 


1.70E-05 


619 


3590.C20.GZ43 512404 


D86566 


Human DNA for NOTCH4, partial cds 


-3.20E-07 



116 



WO 2004/039943 



PCT/LS2003/015465 



Table 8 



SEQ 
ID 


SEQ NAME 


ACCESSION 


GENBANK DESCRIPTION 


GENBANK 
SCORE 


620 


3590.D03.GZ43 512133 


D10371 


Phocine distemper virus (PDV) genomic 
RNA for N, P, V, C M F H and L protein 


2 90E-05 


621 


3590.D19.GZ43 512389 


M96163 


Mus musculus (clone 2) serum inducible 
kinase (SNK.) mRNA, mRNA sequence 




622 


3590.D23.GZ43 512453 


AF086485 


Homo sapiens full length insert cDNA clone 
ZD93E02 


7.70E-09 


623 


3590.E08.GZ43 512214 


AF055278 


Homo sapiens DMA repair protein XRCC4 
(XRCC4) gene, exon 1 


5.90E-12 


624 


3590.E10.GZ43 512246 


AE001477 


Helicobacter pylori, strain J99 section 38 of 


1 f\C\TJ fl£ 

z.uuii-uo 


625 


3590.F01.GZ43 512103 


AF080395 


Entamo^ba^toly^cTacto^md^g protein 
(abp2) mRNA, partial cds 


2.00E-06 


626 


3590.F16.GZ43 512343 


X79388 


B subtilis (168) prkA gene 


1 20E-05 


627 


3590.G01.GZ43 512104 


U32690 


Haemophilus influenzae Rd section 5 of 163 




628 


3590.G02.GZ43 512120 


U68040 


Cochliobolus heterostrophus polyketide 




629 


3590.H04.GZ43 512153 


X66013 


T.aestivum gene for cathepsin B (All 6) 


2.50E-07 


630 


3590.H06.GZ43 512185 


X66 177 


M.musculus mRNA for Hox 2.7 protein 


8.00E-06 


631 


3590.H09.GZ43 512233 


AFO 12899 


proTeni^re^or^mRNA 0 co^pktecds 


3 40E-11 


632 


3590.H12.GZ43 512281 


Y15724 


Homo sapiens SERCA3 gene, exons 1 -7 
(and joined CDS) 




633 


3590.H16.GZ43 512345 


AF064079 


Plasmodium gallinaceum endocliilinase 
precursor, mRNA, complete cds 


6.70E-09 


634 


3590.I16.GZ43 512346 


L06280 


Drosophila melanogaster adenine 
phosphoribosyltransferase (APRT) gene, 
complete cds 


4.40E-07 


635 


3590J01.GZ43 512107 


X69573 


T.reesei xynl gene, complete CDS . 


1.70E-07 


636 


3590J02.GZ43 512123 


AF092047 


Homo sapiens homeobox protein Six3 
(S1X3) gene, complete cds 


4.00E-06 


637 


3590.J18.GZ43 512379 


AB027966 


Schizosaccharomyces pombe gene for 
Hypothetical protein, partial cds, 
clone:TB89 


2.60E-08 


638 


3590.J21.GZ43 512427 


AK014727 


Mus musculus 0 day neonate head cDNA, 
RIKEN full-length enriched library, 
clone:4833419G08, full insert sequence 


7.90E-08 


639 


3590.J22.GZ43 512443 


AK020136 


Mus musculus 12 days embryo male 
wolffian duct includes surrounding region 
cDNA, RTKEN full-length enriched library, 
clone:6720460K10, full insert sequence 


5.90E-08 


640 


3590.K06.GZ43 512188 


AF171890 


Trimeresurus trigonocephalus cytochrome b 
(cyfb) gene, partial cds; mitochondrial gene 
for mitochondrial product 


3.00E-06 


641 


3590.K10.GZ43 512252 


U16775 


Human immunodeficiency virus type 1 
isolate VE6 reverse transcriptase (pol) gene, 
partial cds 


6.00E-06 
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642 


3590.K19.GZ43 512396 


U40454 


Candida albicans topoisomerase type I 
(CATOP1) gene, complete cds 


3.00E-06 


643 


3590.L08.GZ43 512221 


U52198 


Vibrio anguillarum flagellin E (flaE), 
flagellin D (flaD), and flagellin B (flaB) 
genes, complete cds, and (flaG) gene, 
partial cds 


2.00E-05 


644 


3590.L10.GZ43 512253 


U01155 


Xenopns laevis angiotensin II receptor 
mRNA, complete cds 


4.00E-06 


645 


3590.M03.GZ43_51214 
2 


AF252499 


Bos taurus clone MNB-88 microsatellite 
sequence 


4.60E-08 


646 


3590.M04.GZ43_51215 
8 


AE007607 


Clostridium acetobutylicum ATCC824 
section 95 of 356 of the complete genome 


4.50E-05 


647 


3590.M09.GZ43_51223 
8 


L04758 


Oryctolagus curdculus cytochrome P-450 
(CYP4A4) gene, 5' end 


1.00E-06 


648 


3590.N04.GZ43 512159 


Z82038 


Cthermosaccharolyticum etfB, elfA, hbd, 
till A and actA genes 


2.00E-06 


649 


3590.N19.GZ43 512399 


U15603 


Saccharomyces cerevisiae Csd3p (CSD3) 
gene, complete cds 


4.00E-06 


650 


3590.N21.GZ43 512431 


L19535 


Drosophila subobscura sry alpha gene, 
complete cds 


6.00E-06 


651 


3590.O08.GZ43 512224 


L36588 


Homo sapiens intron-encoded U22 small 
nucleolar RNA (UHG) gene 


4.30E-07 


652 


3596.C02.GZ43 512500 


L 14849 


Drosophila melanogaster cytoplasmic 
protein tyrosine phosphatase (PTP61F) 
mRNA, complete cds 


8.90E-09 


653 


3596.C20.GZ43 512788 


M60286 


Herpesvirus saimiri immediate early region 
protein genes, complete cds 


1.30E-07 


654 


3596.C22.GZ43 512820 


X15121 


Soybean Gyl gene for glycinin subunit Gl 


1.00E-06 


655 


3596.D01.GZ43 512485 


Z78414 


Caenorhabditis elegans cosmid W09D12, 
complete sequence 


4.00E-06 


656 


3596.D07.GZ43 512581 


M88242 


Mouse glucocortoid-regulated inflammatory 
prostaglandin. G/H synthase (griPGHS) 
mRNA, complete cds 


1.70E-05 


657 


3596.D09.GZ43 512613 


X99710 


L.lactis ORF, genes homologous to vsf-1 
andpepF2 and gene encoding protein 
homologous to methyltransf erase 


5.00E-06 


658 


3596.D17.GZ43 512741 


AF200361 


Rattus norvegicus cytoclirome P450 4F1 
(Cyp4Fl) gene, complete cds 


1.40E-05 


659 


3596.E08.GZ43 512598 


AF1 11848 


Homo sapiens PRO0529 mRNA, complete 
cds 


5.00E-06 


660 


3596.E22.GZ43 512822 


X58178 


S.pvogenes for emm41_gene 


5.00E-06 


661 


3596.F10.GZ43 512631 


AL390161 


Homo sapiens mRNA; cDNA 
DKFZp761P0615 (from clone 
DKFZp761P0615) 


2.00E-06 


662 


3596.G13.GZ43 512680 


AJ000044 


Tenebrio molitorLPCP29 gene 


2.00E-06 


663 


3596.H04.GZ43 512537 


U65018 


Dictyostelium discoideum 
mannosyltransf erase gene, complete cds 


3.60E-07 


664 


3596.H10.GZ43 512633 


AF104390 


Penaeus monodon hyperglycemic hormone 
homolog PmSGP-V precursor,- mRNA, 
complete cds 


2.00E-06 
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665 


3596.H17.GZ43 512745 


D28915 


Human gene for hepatitis C-associated 
microtubular aggregate protein p44, exon 9 
and complete cds 


1.00E-06 


666 


3596.H22.GZ43 512825 


AF198250 


Dictyostelium discoideum lim2 protein 
(limB) mRNA, complete cds 


7.30E-07 


667 


3596.I06.GZ43 512570 


U32444 


Solanum lycopersicum photochrome F 
(PHYF) gene, partial cds 


1.10E-05 


668 


3596.I16.GZ43 512730 


U32444 


Solanum ly cop ersicumphyto chrome F 
(PHYF) gene, partial cds 


8 00E-06 


669 


3596.J04.GZ43 512539 


D28596 


Chicken gene for c-maf pro to-onco gene 
product c-Maf, short form complete cds and 




670 


3596.J13.GZ43 512683 


AB007856 


Homo sapiens KIAA0396 mRNA, partial 
cds 


2.40E-05 


671 


3596.K14.GZ43 512700 


AC024752 


Caenorhabditis elegans cosmid Y1B5A, 


3.00E-06 


672 


3596.K15.GZ43 512716 


Y00469 


Yeast mRNA for profihn 


2.00E-06 


673 


3596.L01.GZ43 512493 


X79703 


O.aries gene for beta- casein 


4.00E-06 


674 


3596.L08.GZ43 512605 


AJ007313 


Stieptomyces coehcolor sigT, trxB and trxA 
genes, and ORF1 and ORF2 


9 80E-07 


675 


3596.L13.GZ43 512685 


AK018239 


Mus musculus adult male medulla 
oblongata cDNA, RIKEN full-length 
enriched library, clone:6330563C09, full 
insert sequence 


1.00E-06 


676 


3596.N02.GZ43 512511 


AE001387 


Plasmodium falciparum chromosome 2, 
section 24 of 73 of the complete sequence 


1.00E-06 


677 


3596.N12.GZ43 512671 


Z12841 


O. cuniculus mRNA for phospholipase 


4.00E-06 


678 


3596.N15.GZ43 512719 


U14186 


Bos taurus general vesicular transport factor 
pi 15 mRNA, complete cds 


1.70E-05 


679 


3596.N16.GZ43 512735 


U41106 


Caenorhabditis elegans cosmid W06A11 


1.10E-05 


680 


3596.N21.GZ43 512815 


AF097717 


Homo sapiens 3'-phosphoadenosine 5'- 
phospho sulfate synthetase (PAPSS) exon 8 


1.40E-07 


681 


3596.O10.GZ43 512640 


AE001649 


Chlamydia pneumoniae section 65 of 103 of 
the complete genome 


1.10E-05 


682 


3596.012.GZ43 512672 


AC006623 


Caenorhabditis elegans clone C52H2, 
complete sequence 


4.00E-06 


683 


3596.P03.GZ43 512529 


X82317 




1.49E-03 


684 


3596.P04.GZ43 512545 


AF111855 


Agrobacterium tumefaciens RNA 
polymerase alpha subunit (rpoA) gene, 
complete cds 


2.00E-06 


685 


3596.P07.GZ43 512593 


L40817 


Homo sapiens muscle-specific DNase I-hke 
(DNL1L) gene, exons 1-9, complete cds 


3.00E-06 


686 


3596.P08.GZ43 512609 


M14505 


Human (clone PSK-J3) cyclin-dependent 
protein kinase mRNA, complete cds., 


5.00E-06 


687 


3596.P10.GZ43 512641 


M73770 


P.faldparumRNApolymerase III largest 
subunit gene, complete cds 


2.90E-05 
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688 


3596.P21.GZ43 512817 


S82725 


NPM/ALK=fusion gene {translocation 
breakpoint} [human, lymphoma cells SU- 
DHL-1, Genomic, 1679 nt] 


1.00E-07 


689 


3599.A04.GZ43 512914 


X83212 


H.sapiens tryptophan hydroxylase gene, 
promoter region 


5.50E-07 


690 


3599.A23.GZ43 513218 


U05259 


Human MB-1 gene, complete cds 


2.10E-05 


691 


3599.B15.GZ43 513091 


AF277068 


HIV-1 clone QH0791 from Trinidad and 
Tobago, envelope protein (env) gene, 
complete cds 


6.10E-07 


692 


3599.B16.GZ43 513107 


M60517 


Chicken vitronectin receptor alpha subunit 
mRNA, complete cds 


4.00E-06 


693 


3599.C03.GZ43 512900 


AB021267 


Arabidopsis thaliana copia-like 
retrotransposon AtRE2-2 gene for 
polyprotein, complete cds 


2.00E-06 


694 


3599.C17.GZ43 513124 


U28055 


Homo sapiens hepatocyte growth factor-like 
protein homolog mRNA, partial cds 


3.00E-06 


695 


3599.D03.GZ43 512901 


L43550 


Buchnera aphidicola antliranilate synthase 
small subunit (trpG) gene, anthranilate 
synthase large subunit (trpE) gene, complete 
cds 


3.00E-06 


696 


3599.D05.GZ43 512933 


AL023779 


S.pombe chromosome 11 cosmid c244 


2.00E-06 


697 


3599.D07.GZ43 512965 


AL391223 


Human chromosome 14 DNA sequence 
Partial sequence from BAC R-325N7_PCR1 
of library RPCM1 from chromosome 14 of 
Homo sapiens (Human), complete sequence 


5.00E-06 


698 


3599.D10.GZ43 513013 


AF064079 


Plasmodium gallmaceum mdochitinase 
precursor, mRNA, complete cds 


1.70E-07 


699 


3599.E01.GZ43 512870 


U09184 


Bucrtnera aphidicola terredoxin-NAD*' 
reductase (fprl) gene, partial cds; 
anthranilate synthase large subunit (trpE) 
and anthranilate synthase small subunit 
(trpG) genes, complete cds; heat shock 
protein (hslU) gene, partial cds; and 
unknown gene 


9.60E-07 


700 


3599.E05.GZ43 512934 


X60145 


Human J-alpha segment J-alpha FR9 
mRNA for J-alpha region of T-cell receptor 


1.20E-05 


701 


3599.F17.GZ43 513127 


U27037 


Fistulina hepatica mitochondrial small 
subunit ribosomal RNA, mitochondrial 
gene, partial sequence 


2.00E-06 


702 


3599.F24.GZ43 513239 


Z78414 


Caenorhabditis elegans cosmid W09D12, 
complete sequence 


5.00E-06 


703 


3599.H05.GZ43 512937 


AF032891 


Cainponotus consobrinus microsatelhte- 
containing sequence Cconl2 


2.10E-08 


704 


3599.H23.GZ43 513225 


AB024553 


Bacillus halodurans DNA, complete and 
partial cds, strain:C-125 


4.70E-07 


705 


3599.J11.GZ43 513035 


AB025112 


Xenopus laevis XGC-2 mRNA for guanylyl 
cyclase-2, complete cds 


3.00E-06 


706 


3599.K02.GZ43 512892 


AJ224474 


B orrelia burgdorferi left chromosomal 
subtelomeric region (truA gene) 


3.00E-06 



120 



WO 2004/039943 



PCT/LS2003/015465 



Table 8 



SEQ 
ID 


SEQ NAME 


ACCESSION 


GENBANK DESCRIPTION 


GENBANK 
SCORE 


707 


3599.K04.GZ43 512924 


X99710 


LlactLs ORF, genes homologous to vsf-1 
and pep F2 and gene encoding protein 
homologous to methyltransf erase 


5.00E-06 


708 


3599.K23.GZ43 513228 


AF074247 


Homo sapiens neuronal delayed-rectifier 
voltage-gated potassium channel splice 
variant (KCNQ2) mRNA, complete cds 


8.00E-07 


709 


3599.L04.GZ43 512925 


X59773 


Pisum sativum mRNA for P protein, a part 
of glycine cleavage complex 


1.40E-05 


710 


3599.L15.GZ43 513101 


U34282 


Rattus norvegicus fast skeletal muscle 
sarcoplasmic reticulum Ca-ATPase 
(SERCA1) gene, 5'-flanking sequence 


2.00E-06 


711 


3599.M04.GZ43 51292 
6 


AK018953 


Mus museums adult male testis cDNA, 
RIKEN full-length enriched library, 
clone: 1700 111D04, full insert sequence 


2.30E-11 


712 


3599.M22.GZ43 51321 
4 


AB052179 


Macacafascicularis brain cDNA, 
clone:QnpA-21934 


4.70E-07 


713 


3599.M24.GZ43 51324 
6 


AE003394 


Drosophila melanogaster genomic scaffold 
142000013386028, complete sequence 


7.30E-07 


714 


3599.N09.GZ43 513007 


XI 63 62 


Rat SPI-2 serine protease inhibitor gene 


1.19E-04 


715 


3599.N16.GZ43 513119 


X92421 


Xlaevis mRNA for RNA hehcase p54 


3.00E-06 


716 


3599.N20.GZ43 513183 


M59447 


Drosopliila melanogaster Sex-lethal (Sxl) 
mRNA, complete cds 


2.00E-06 


717 


3599.N24.GZ43 513247 


AC005485 


Homo sapiens PAC clone RP5-998M2 from 
7q33-q35, complete sequence 


2.00E-07 


718 


3599.O06.GZ43 512960 


AJ131667 


Escherichia coli plasmidpSF0157 


2.00E-06 


719 


3599.017.GZ43 513136 


X96607 


M.musculus IgH 3' alpha enhancer DNA 


8.10E-05 


720 


3599.P05.GZ43 512945 


X77111 


N.tabacmn chi-V gene 


1.50E-07 


721 


3602.A09.GZ43 513378 


AF015303 


Xenopus laevis small GTPase Ran binding 
protein 1 mRNA, complete cds 


1.10E-05 


722 


3602.B18.GZ43 513523 


L18892 


Tetrahymena thermophila histone (H2A. 1) 
gene, complete cds 


5.70E-07 




3602.B21.GZ43 513571 


BC005233 


Homo sapiens, clone MGC: 12257 
IMAGE:3950129, mRNA, complete cds 


1.60E-10 


724 


3602.B22.GZ43 513587 


X71765 


P. falciparum gene for Ca2+ - ATPase 


1.00E-06 


725 


3602.C24.GZ43 513620 


AL080106 


Homo sapiens mRNA; cDNA 
DKFZp566O053 (from clone 
DKFZp566O053) 


2.00E-06 


726 


3602.D06.GZ43 513333 


AF098970 


Phaseolus vulgaris NBS-LRR-like protein 
cD7 (CO-2) mRNA, partial cds 


1.70E-07 


727 


3602.D11.GZ43 513413 


M59770 


P.falciparum calmodulin gene, complete cds 


2.20E-07 


728 


3602.E04.GZ43 513302 


X53582 


Zea mays ZMPMS1 gene for 19 kDa zein 
protein 


1.30E-05 


729 


3602.E06.GZ43 513334 


L38718 


Providencia stuarrh (clone pSK.aarP) 
transcriptional activator (aarP) gene, 
complete cds 


7.90E-07 


730 


3602.E13.GZ43 513446 


U58106 


Blomiatropicalis allergen mRNA, complete 
cds 


1.70E-07 
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3602 E21 GZ43 513574 


M15085 


T.brucei expressed copy of the ILTat 1.3 
variable surface glycoprotein gene, 5' flank 




732 


3602.F12.GZ43 513431 


X64802 


H. sapiens F8 mRNA for Interleukin-1 -like 
species 


3.40E-58 


733 


3602.G03.GZ43 513288 


AF036148 


cds 


2.00E-06 


734 


3602.G17.GZ43 513512 


U41106 


Caenorhabditis elegans cosmid W06A11 


1.30E-05 




3602 107 GZ43 5133 5 4 


AF000941 


2-6 of locus control region (LCR) for T-cell 


1.20E-05 


736 


3602.I11.GZ43 513418 


AL133620 


Homo sapiens mRNA; cDNA 
DKFZp434F0621 (from clone 
DKFZp434F0621) 


3.00E-06 


737 


3602.I15.GZ43 513482 


U23479 


Dictyostelium discoideum 
phosphaudybnositol 4-kinase (PIK4) 
mRNA, complete cds 


8.00E-07 


738 


3602.J13.GZ43 513451 


AK025319 


Homo sapiens cDNA: FLJ21666 fis, clone 
COL08915 


3.30E-07 


739 


JOU2.JVUJ. IjZ;4o Jl3J.yJ. 


X85811 


S cerevisiae tRNA-Leu and ORF's N22 12 
N2215 N2219 N2223 N2227 N2231 


1 10E 05 




3602K06GZ43 513340 


AF133052 


Walleye epidermal hyperplasia virus type 2 
long terminal repeat, complete sequence; 
gag polyprotein (gag-pol) gene, complete 
cds; pol polyprotein (gag-pol) gene, partial 
cds; envelope polyprotein (env) and cyclin 


4.00E-06 


741 


3602.L20.GZ43 513565 


M62717 


Human CSP-B gene flanking sequence 


1.10E-05 


742 


3602.N03.GZ43 513295 


Z81126 


complete sequence 


5.70E-05 




■s^fio xrn^ CV7A.1 




Human OBR gene, intron sequence 
immediately adjacent to the 5' end of coding 


1.00E-06 


744 


'X^f\< A 1 ^ rrrrA'X ^I^Q^S 

juUj.AI J.gZ'f J jIjojo 


Z46507 


Bovine herpesvirus type 4 genomic DNA 
region (V.TEST) 


5.00E-06 


745 


3605.C16.gz43 513876 


AF282517 


Homo sapiens clone lOptel c6f7 sequence 


9.40E-08 


746 


3605.E19.gz43 513926 


Z22923 


complete CDS 


2.10E-05 


747 


3605.Gl3.gz43 513832 


AJ132752 


Gadus morhua mRNA for beta2~ 
microglobulin, clone b3 


1.30E-05 


748 


3605.H10.gz43 513785 


AF257480 


Ranatemporariamicrosatellite SB80 
sequence 


4.10E-09 


749 


3605.H21.gz43 513961 


X63507 


Mmusculus HOX-3.5 gene 


7.80E-05 


750 


3605.I19.gz43 513930 


AK002100 


Homo sapiens cDNAFLJ11238 fis, clone 
PLACE1008532 


3.30E-11 


751 


3605.J16.gz43 513883 


AF039197 


Gallus gallus Pax-9 gene, putative 5' 
regulatory sequence 


1.00E-07 
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752 


3605.K19.gz43 513932 


X63853 


S. cerevisi&c MAT locus gsnes BUD5, mat- 
alphal, mat-a]pha2, YCR724 and YCR725 


8.00E-06 


753 


3605.M17.gz43 513902 


M30931 


Simian immunodeficiency virus (SIV) 
proviral, complete genome 


3.70E-05 


754 


3605.N04.gz43 513695 


AF169388 


m^^c^plete'cds 4 C ° llaSen IV < ~ Col4a4 ' ) 


8.90E-05 


755 


3605.N09.gz43 513775 


AF029 111 


Adelius sp. 16S ribosomal RNA gene, 
mitochondrial gene for mitochondrial RNA, 


2 80E-07 


756 


3605.N12.gz43 513823 


BC000358 


Homo sapiens, protein kinase, AMP- 
activated, gamma 1 non-catalytic subunit, 
clone MGC:8666 IMAGE:2964434, mRNA, 
complete cds 


3.90E-47 


757 


3605.N16.gz43 513887 


X95301 


D.rerio mRNA for HER-5 protein 


1.00E-06 


758 


3608.B06.gz43 514099 


X00004 


.taurus gene encoding pituitary glycoprotein 


6.30E-08 


759 


3608.B12.gz43 514195 


X00525 


Mouse 28S ribosomal RNA 


3.10E-13 


760 


3608.B24.gz43 514387 


AF269848 


clone step.l026e06 genomic sequence 


2.00E-06 


761 


3608.C18.gz43 514292 


BC000387 


Homo sapiens, U6 snRNA-asso dated Sm- 
like protein, clone MGC:8433 
IMAGE:282] 171, mRNA, complete cds 


2.50E-10 


762 


3608.E17.gz43 514278 


BC008245 


Homo sapiens, clone IMAGE:3 875012, 
mRNA 


1.00E-06 


763 


3608.E20.gz43 514326 


U86646 


Ailurus fulgens beta casein gene, exon 7, 
partial cds 


4.70E-07 


764 


3608.F13.gz43 514215 


AF125672 


Homo sapiens silencing mediator of retinoic 
acid and thyroid hormone receptor extended 
isoform (SMRTE) mRNA, complete cds 


2.00E-06 


765 


3608.G09.gz43 514152 


AE001066 


Archaeoglobus fulgidus section 41 of 172 of 


4.00E-06 


766 


3608.H05.gz43 514089 


AJ224981 


Mus museums calpain 3 gene, exon 1 


3.00E-06 


767 


3608.H14.gz43 514233 


AE007394 


194 of the complete genome 


3.20E-05 




3608 H18 gz43 514297 


Z36046 


S.cerevisiae chromosome 11 reading frame 
ORF YBR177c 


7 00E-06 


769 


3608.J17.gz43 514283 


AF024648 


Arabidopsis thaliana receptor-like 
serine/threonine kinase (RKF1) mRNA, 
complete cds 


8.00E-06 


770 


3608.J24.gz43 514395 


AJ002258 


Rattus Norvegicus mRNA for Prx3 A protein 


3.60E-07 


771 


3608.K03.gz43 514060 


M83199 


Simmondsia chinensis stearoyl-acyl carrier 
protein desaturase mRNA, complete cds 


2.50E-07 


772 


3608.K14.gz43 514236 


AK026999 


Homo sapiens cDNA: FLJ23346 fis, clone 
HEP13716 


2.00E-06 


773 


3608.L07.gz43 514125 


M32684 


Homo sapiens ITGB3 gene, intron 13, 
fragment B, partial sequence 


3.60E-07 
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774 


3608.L14.gz43 514237 


Z34845 


H. sapiens serotonin transporter gene 


8.60E-07 


775 


3608.N09.gz43 514159 


AK022341 


Homo sapiens cDNAFLJ12279 fis, clone 
MAMMA1001743, weakly similar to Y 
BOX BINDING PROTEIN-l 


2.00E-06 


776 


3608.N19.gz43 514319 


M15085 


T.brucei expressed copy of the ILTat 1.3 
variable surface glycoprotein gene, 5' flank 


7.80E-08 


777 


3608.N20.gz43 514335 


AF026169 


Homo sapiens SALF (SALF) mRNA, 
complete cds 


1.00E-05 


778 


3608.O04.gz43 514080 


U85193 


Human nuclear factor I-B2 (NFIB2) mRNA, 
complete cds 


7.10E-07 


779 


3608.P22.gz43 514369 


AF124241 


Callerya australis chloroplasttRNA-Leu 
(trnL) gene, inrron sequence 


3.90E-07 


780 


3611.A17.gz43 514658 


X01412 


Drosophila melanogaster genes for tRNA- 
Val and tRNA-Pro (90BC tRNA locus) 


2.00E-06 


781 


3611.Bll.gz43 514563 


AL049938 


Homo sapiens mRNA; cDNA 
DKFZp564P1916 (from clone 
DKFZp564P1916); partial cds 


9.80E-10 


782 


3611.B16.gz43 514643 


M86514 


Ratproline-rich protein mRNA, 3' end 


1.30E-05 


783 


3611.C09.gz43 514532 


U55950 


Pleurodeles waltl cytochrome b (CYT-b) 
gene, mitochondrial gene encoding 
mitochondrial protein, partial cds 


2.00E-06 


784 


3611.E07.gz43 514502 


AF261009 


Lethrinus miniatus clone 89rte, 
micro satellite sequence 


1.70E-12 


785 


3611.E12.gz43 514582 


M60200 


Rat vitamin D binding protein gene, exons 5 


1.50E-05 


786 


3611.E20.gz43 514710 


BC002458 


Homo sapiens, clone DVIAGE:3343 171, 
mRNA, partial cds 


2.00E-06 


787 


3611.F15.gz43 514631 


U28328 


Bos taurus dinucleotide repeat RM154, 
tandem repeat region 


4.30E-27 


788 


3611.H10.gz43 514553 


AE003147 


Drosophila melanogaster genomic scaffold 
142000013385388, complete sequence 


6.00E-07 


789 


3611.H22.gz43 514745 


X16135 


Human mRNA for novel heterogeneous 
nuclear RNP protein, L protein 


7.00E-06 


790 


3611.I04.gz43 514458 


AK001460 


Homo sapiens cDNA FLJ10598 fis, clone 
NT2RP2004841 


5.10E-44 


791 


3611.I13.gz43 514602 


M58380 


Arabidopsis tlialiana peroxidase (neutral, 
prxCa) gene, complete cds 


3.00E-06 


792 


3611.J04.gz43 514459 


S81486 


p53 {alternatively spliced, intron 9} 
Piuman, Genomic Mutant, 133 nt] 


1.20E-07 


793 


3611.J15.gz43 514635 


AC008240 


Leishmania major chromosome 22 clone 
L9259 strain Friedlin, complete sequence 


4.90E-05 


794 


3611.J17.gz43 514667 


Z17425 


Liliuni speciosumfortwo putative cds's 


8.90E-07 


795 


3611.J22.gz43 514747 


U60736 


Human IgHC locus intergenic sequence 


4.60E-07 


796 


3611.K01.gz43 514412 


AE001377 


Plasmodium falciparum chromosome 2, 
section 14 of 73 of the complete sequence 


3.00E-06 


797 


3611.K12.gz43 514588 


X02367 


Glaucoma chattoni rDNA 3 1 NTS 


9.80E-08 


798 


3611.L22.gz43 514749 


U19361 


Petromyzon marinus neurofilament subunit 
NF-180 mRNA, complete cds 


5.40E-08 
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799 


3611.M18.gz43 514686 


X95301 


D.rerio mRNA for HER-5 protein 


1.00E-06 


800 


3611.M24.gz43 514782 


AF010239 


Caenorhabditis elegans glutathione S- 
transferase (CeGSTl) mRNA complete cds 


7 70E-07 


801 


3611.N01.gz43 514415 


L19300 


Staphylococcus aureus DNA sequence 
encoding three ORFs, complete cds; 
prophage phi-11 sequence homology, 5' 
flank 


1.00E-06 


802 


3611.N09.gz43 514543 


U50382 


partial cds 


7.00E-06 


803 


3611. 016. gz43 514656 


AB056785 


Macaca fascicularis brain cDNA 


6 60E-07 


804 


3611.P08.gz43 514529 


AK026905 


Homo sapiens cDNA: FLJ23252 fis, clone 
COL04668 


8.00E-06 


805 


3614.C18.gz43 515060 


AF239178 


Paracoccidioides brasiliensis Ion proteinase 
gene, complete cds; nuclear gene for 
mitochondrial product 


5.00E-06 


806 


3614.D14.gz43 514997 


AB017511 


Hydra magnipapillata mRNA for PLC- 


1.20E-05 


807 


3614.D21.gz43 515109 


L10713 


Pig trinucleotide repeat 


1.80E-05 


808 


3614.E06.gz43 514870 


X99739 


M.musculus mRNA for UBC9 protein, 
containing ubiquitin box 


9.10E-07 


809 


3614.F22.gz43 515127 


AK021490 


Homo sapiens cDNAF!J11428 fis. clone 
HEMBA1001071, highly suTdlar to 
PROCOLLAGEN ALPHA 1 (III) CHAIN 
PRECURSOR 


2.00E-06 


810 


3614.G20.gz43 515096 


M86514 






811 


3614.H09.gz43 514921 


AF068289 


Homo sapiens HDCMD34P mRNA. 
complete cds 


6.60E-H 


812 


3614.H22.gz43 515129 


X62423 


P.falciparumpol delta gene for DNA 
polymerase delta 


4.00E-06 


813 


3614.J07.gz43 514891 


X81027 


H sapiens tal-1 DNA 


l 30E-05 


814 


3614.K22.gz43 515132 


X63073 


phycoer>1hrin beta and alpha subunits 


1.60E-05 


815 


3614.L13.gz43 514989 


V01561 


Mouse dispersed repetitive DNA sequences 
of the R-fainily and simple sequence DNA; 
member of the Bl family of mouse 


3.00E-06 


816 


3614.M08.gz43 514910 


AF272983 


Homo sapiens SRC tyrosine kinase gene, 
exons lalpha and la, alternatively spliced 


4.00E-06 


817 


3614.O02.gz43 514816 


X58913 


Mitochondrion Drosophila eugracihs ND2 
and COI genes (partial) and genes for 
tRNA-Trp, tRNA-Tyr, and tRNA-Cys 


8.50E-08 


818 


3614.O07.gz43 514896 


AL031538 


S.pombe chromosome III cosmid cl906 


9.80E-07 


819 


3614.016.gz43 515040 


AB056785 


Macaca fascicularis brain cDNA 
clone:QnpA-11655, full insert sequence 


2.00E-06 


820 


3614.Pll.gz43 514961 


X91656 


M.musculus Srp20 gene 


4.60E-05 
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821 


3614.P16.gz43 515041 


Z58907 


H. sapiens CpG island DNA genomic Msel 
fragment, clone 116a6, forward read 
cpgll6a6.ftla 


3.20E-70 


822 


3617.B16.gz43 515411 


AF098275 


Homo sapiens PSI2TOM20 pseudogene, 
complete sequence 


1.10E-67 


823 


3617.C21.gz43 515492 


AJ009913 


Bos taurus pip gene 


3.40E-05 


824 


3617.F10.gz43 515319 


L07487 


BradyrMzobium japonicum heme-copper 
oxidase subunit I homolog (fixN), 
cytochrome c (fixO), transmembrane 
proteins (fixO and fixQ) diheme cytochrome 
c (fixP) and fixG genes, complete cds 


6.70E-05 


825 


3617.H16.gz43 515417 


X54192 


O.sativa GluB-2 gene for glutelin 


2.00E-06 


826 


36 17.10 l.gz43 515178 


AL513316 


Human DNA sequence from clone RP11- 
52203 on chromosome 10, complete 
sequence [Homo sapiens] 


7.20E-08 


827 


3617.L16.gz43 515421 


AE007662 


Clostridium acetobutylicum ATCC824 
section 150 of 356 of the complete genome 


3.00E-06 


828 


3617.L21.gz43 515501 


AL031538 


S.pombe chromosome III cosmid cl906 


1.00E-06 


829 


3617.M08.gz43 515294 


X64802 


H. sapiens F8 mRNA for Interleukin-l-like 

species 


3.40E-58 


830 


3617.M13.gz43 515374 


Z79239 


H.sapiens flow-sorted chromosome 6 TaqI 
fragment, SC6pA26F6 


1.10E-07 


831 


3617.N05.gz43 515247 


AF3 87666 


Mandrillus cytomegalovirus strain OCOM6- 
2 glycoprotein B (gB) gene, partial cds 


1.00E-06 


832 


3617.N10.gz43 515327 


AB017511 


Hydra magnipapillata mRNA for PLC- 
betaHl, complete cds 


1.10E-05 


833 


3617.N14.gz43 515391 


AJ249346 


Mus museums Ankrd2 gene for ankyrin 
repeat domain 2 (stretch responsive 
muscle), exons 1-9 


1.00E-05 


834 


3617.N19.gz43 515471 


U27037 


Fistulina hepalica mitochondrial small 
subunit ribosomal RNA mitochondrial 
gene, partial sequence 


2.00E-06 


835 


3617.Pll.gz43 515345 


AK002100 


Homo sapiens cDNAFLJ11238 fis, clone 
PLACE1008532 


1.20E-13 


836 


3617.P12.gz43 515361 


U04860 


Rattus norvegicus Sprague-Dawley All 
receptor mRNA, complete cds 


8.00E-05 


837 


3617.P13.gz43 515377 


AE007356 


Streptococcus pneumoniae section 39 of 
194 of the complete genome 


3.80E-05 


838 


3620.B03.gz43 515810 


AF238884 


Botrytis virus F, complete genome 


6.00E-06 


839 


3620.B24.gz43 516146 


AF244812 


Homo sapiens SCAN domam-containing 
protein 2 (SCAND2) gene, complete cds, 
alternatively spliced 


1.30E-07 


840 


3620.E12.gz43 515957 


X95301 


D.rerio mRNA for HER-5 protein 


1.00E-06 


841 


3620.E13.gz43 515973 


X52289 


Human (D21S167) DNA segment 
containing (GT)19 repeat 


2.50E-19 


842 


3620.E17.gz43 516037 


AJ002414 


Arabidosis thaliana mRNA for a hnRNP- 
like protein 


9.70E-08 
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843 


3620.E19.gz43 516069 


X16982 


3 flanking DNA 


2.70E-07 




J0ZU.riZj.gZ4O J101JJ 


24943 g 


S . cerevisiae chromosome X reading frame 
ORF YJL163c 


3 00E-06 




3620 E24 gz43 516149 


M75883 


Human sterol carrier protein X/sterol carrier 


8 00E-06 


846 


3620.G17.gz43 516039 


U92971 


Human protease-activated receptor 3 
(PAR3) mRNA, complete cds 


3.80E-07 




3620 G23 gz43 516135 




X.laevis mRNA XLFLI 


1.60E-05 


848 


3620.J18.gz43 516058 


U37373 


Xenopus laevis tail-specific thyroid 

complete cds i 


3.00E-06 


849 


3620.K19.gz43 516075 


U31780 


Human papillomavirus type 22, complete 


5.00E-06 


850 


3620.K24.gz43 516155 


M95627 


Homo sapiens angio-associated migratory 
cell protein (AAMP) mRNA, complete cds 


6.00E-06 


851 


3620.O23.gz43 516143 


LI 1172 


Plasmodium falciparum RNA polymerase I 
gene, complete cds 


1.00E-05 






AF1 32745 


Mus museums Sox2 gene, regulatory region 


7.70E-07 


853 


3623.E03.gz43 516197 


X82566 


M.musculus glyTl gene (exon 0a) 


1.80E-09 


854 


3623.E15.gz43 516389 


AF104420 


RNA dependent RNA polymerase gene, 
partial cds; virus envelope protein spike (S), 
envelope protein (sM), envelope protein 

cds; and unknown genes 


2.90E-05 


855 


3623.F03.gz43 516198 


AJ009936 


receptor PRR1 


1.70E-05 


856 


3623.F20.gz43 516470 


U22657 


cellular morphology 


5.80E-05 




J50Zj.VJl4.gZ4o 010J5 ID 


AB035309 


Paramecium caudatum PcTERT mRNA for 
telomerase reverse transcriptase, complete 


3.00E-06 


858 


3623.H07.gz43 516264 


Z 17324 


Homo sapiens of MUC1 gene encoding 
Mucin 


1.80E-07 




Jozj.rllU.gZ4j JlOJIZ 


I\DVj ju /u 


Homo sapiens mRNA for KIAA1244 
protein, partial cds 


2.80E-05 


860 


3623.H23.gz43 516520 


AF131763 


Homo sapiens clone 25232 mRNA sequence 


1.70E-05 


861 


3623.I08.gz43 516281 


M60421 


Human cytochrome P450scc gene, 5' end 
and promoter region 


2.80E-05 


862 


3623.Ill.gz43 516329 


AK013191 


Mus museums 10, 11 days embryo cDNA, 
PvJKEN full-length enriched library, 
clone:2810429I04, full insert sequence 


3.00E-06 


863 


3623.L05.gz43 516236 


AJ131991 


Linum usitatissimum target sequence for 
LIS-1 insertion in PI 


3.00E-06 
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864 


3623.L24.gz43 516540 


U09377 


Arabidopsis thaliana GF14chi isoform 
(GRF1) gene, complete cds 


3.00E-06 


865 


3623.M10.gz43 516317 


AF071743 


(TOP2A) gene, exons 25, 26, and 27 


4.00E-06 


866 


3623.N23.gz43 516526 


U57489 


Kubactenumsp. VR 12708 bile acid- 
iaducible operon bile acid-co enzyme A 
ligase (baiB), BaiC, BaiD, bile acid 7 -alpha 
dehydratase (baiE), 3-alphahydroxysteroid 
dehydrogenase (baiA2), BaiF, bile acid 
transporter (baiG), NADHiflavin 
oxidoreductase (bai> 


3.70E-05 


867 


3623.P22.gz43 516512 


U37761 


flanking region 


1.40E-12 


868 


3626.A10.gz43 516689 


D30745 


Xenopus laevis MRP RNA gene 


2.00E-07 


869 


3626.C16.gz43 516787 


AF241271 




1.60E-08 


870 


3626.E07.gz43 516645 


AF053496 


Caenorhabditis elegans beta chain spectrin 
homolog Smal (smal) mRNA, complete 
cds 


2.00E-06 


871 


3626.F03.gz43 516582 


AJ009771 


Homo sapiens mRNA for putative RING 


2 00E-06 


872 


3626.G01.gz43 516551 


BC010926 


Homo sapiens. Similar to H4 histone family, 
member A, clone MGG13512 
IMAGE:4273904, mRNA, complete cds 


1.00E-43 


873 


3626.I20.gz43 516857 


AK025762 


Homo sapiens cDNA: FLJ22109 fis, clone 
HEP18091 


5.80E-07 


874 


3626. 123. gz43 516905 


S55615 


(156)=G surface antigen {3' region, 
restriction fragment EG4} [Paramecium 


3.40E-07 


875 


3626.M13.gz43 516749 


AE001398 


Plasmodium falciparum chromosome 2, 


4.00E-06 




3626.M15.gz43 516781 


AF090925 


Homo sapiens clone HQ0452 PRO0452 


3.10E-07 


877 


3626.N07.gz43 516654 


Z58907 


H.sapiens CpG island DNA genomic Msel 
cpgll6a6.ftla 


2.90E-70 


878 


3626.N24.gz43 516926 


AF041373 


Rattus norvegicus clathrin assembly protein 
short form (CALM) mRNA, complete cds 


8.90E-08 


879 


3626.O08.gz43 516671 


D 10445 


Mouse mRNA for protein C, complete cds 


5.00E-06 


880 


3626.Pll.gz43 516720 


L48479 


Homo sapiens (subclone 6 hi from PI H21) 
DNA sequence 


2.20E-07 


881 


3626.P14.gz43 516768 


X15028 


Chicken hsp90 gene for 90 kDa-heat shock 
protein 5 '-end 


3.80E-05 


882 


3629.A16.gz43 517169 


U16958 


Mus museums pre-T cell receptor alpha- 
type chain precursor mRNA, complete cds 


4.00E-06 


883 


3629.B14.gz43 517138 


X16982 


Drosophila melanogaster micropia-Dml 1 
3flankiTigDNA 


2.50E-07 
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884 


3629.C14.gz43 517139 


Z22537 


C.parvum precursor of oocyst wall protein 


5.00E-06 


885 


3629.E01.gz43 516933 


D00621 


Sus scrofa gene for follicle stimulation 
hormone beta subunit, exons 1, 2, 3, 


3 50E-05 


886 


3629.E20.gz43 517237 


AE006900 


Sulfolobus solfataricus section 259 of 272 of 
the complete genome 


9.00E-06 


887 


3629.F24.gz43 517302 


Y10531 


ClosMdiumperfringens sod gene for 
superoxide dismutase 


2.00E-06 


888 


3629.H10.gz43 517080 


J03654 


Human immunodeficiency virus type 2, 
isolate HIV2FG 


8.00E-06 


889 


3629.H12.gz43 517112 


AFO 17266 


Danio rerio glutamate decarboxylase 
(GAD67) mRNA partial cds 


6 50E-07 


890 


3629.11 l.gz43 517097 


AF020810 


Salmonella enterica VirK (virK), Mig-14 
(mig-14), NxiA (nxiA), TctE (tctE), TctD 
(tctD), TctC (tctC), TctB (tctB), and TctA 
(tctA) genes, complete cds; and 03 60 
(o360) gene, partial cds 


3.00E-06 


891 


3629.I16.gz43 517177 


AE007643 


aostridium acetobutylicum ATCC824 




892 


3629.J03.gz43 516970 


AB017511 


Hydra magnipapillata mRNA for PLC- 




893 


3629.J07.gz43 517034 


M20782 


Human alpha-2-plasmin inhibitor gene, 


2 90E-11 


894 


3632.Cll.gz43 517475 


AF026148 


Perilla frutescens beta-ketoacyl-ACP 
synthase I (KAS 1) mRNA complete cds 


1 00E-06 


895 


3632 C17.gz43 517571 


— U50534 — 


Human BRCA2 region, mRNA sequence 
CG003 


1 00E-05 


896 


3632.F07.gz43 517414 


M12036 


(HER2) gene, partial cds 


4.70E-10 


897 


3632.G01.gz43 517319 


AC006621 


Caenorhabditis elegans cosmid C52A10 
complete sequence 


3.40E-05 


898 


3632.I20.gz43 517625 


AK024381 


Homo sapiens cDNA FL J143 1 9 fis, clone 
PLACE3000406 


9.00E-06 


899 


3632.K20.gz43 517627 


M27634 


Vaccinia virus P4a major core protein gene, 
complete cds 


9.60E-05 


900 


3632.M08.gz43 517437 


X75304 




8.00E-06 


901 


3632.M13.gz43 517517 


U18191 


Human HLA class I genomic survey 


3.20E-07 


902 


3632.M19.gz43 517613 


AFO 12 131 


Homo sapiens brachyury variant B (TBX1) 
mRNA, complete cds 


3.70E-07 


903 


3632.N13.gz43 517518 


AF287491 


Oncorhynchus mykiss MHC class I heavy 
chain precursor (Onmy-UBA) mRNA, 
Onmy-UBA*601 allele, complete cds 


2.00E-06 


904 


3632.N21.gz43 517646 


X62423 


P.falciparampol delta gene for DNA 
polymerase delta 


4.00E-06 


905 


3632.O06.gz43 517407 


BC009868 


Homo sapiens, replication protein A3 
(14kD), clone MGC:16404 
IMAGE:3940438, mRNA, complete cds 


1.40E-18 
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906 


3632.P07.gz43 517424 


AE001066 


Archaeo globus fulgidus section 41 of 172 of 
the complete genome 


3.00E-06 


907 


3635.A06.gz43 517777 


AK005546 


Mus musculus adult female placenta cDNA, 
RJKEN ftdl-length enriched library, 
clone: 1600027G01-, full insert sequence 


1.40E-07 


908 


3635.A08.gz43 517809 


Z49280 


S.cerevisiae chromosome X reading frame 
ORFYJL005w 


6.00E-06 


909 


3635.A13.gz43 517889 


AF143236 


Homo sapiens apoptosis related protein APR 
2 mRNA, complete cds 


2.00E-06 


910 


3635.D07.gz43 517796 


M58150 


Bovine lactoperoxidase (LPO) mRNA, 
complete cds 


3.10E-05 


911 


3635.F01.gz43 517702 


Y19128 


Homo sapiens enteropeptidase gene, exon 6 


3.00E-09 


912 


3635.F06.gz43 517782 


X63073 


Pseudanabaena sp. cpeBA operon encoding 
phycoerythrin beta and alpha subunits 


1.50E-05 


913 


3635.F10.gz43 517846 


AF107688 


Aedes aegypti clone 43 1 Feilai family of 
SINES 


3.50E-05 


914 


3635.H20.gz43 518008 


AE000613 


Helicobacter pylori 26695 section 91 of 134 
of the complete genome 


1.10E-05 


915 


3635.J06.gz43 517786 


U15018 


Dugbe virus L protein gene, complete cds 


1.10E-05 


916 


3635.J09.gz43 517834 


X85444 


G.palhda repetitive DNA element 


2.10E-08 


917 


3635.K05.gz43 517771 


AF090432 


Danio rerio serrateB mRNA, complete cds 


4.00E-06 


918 


3635.K06.gz43 517787 


AJ276631 


Capsicum anmium partial kn gene for 
Knolle protein, promoter region 


6.10E-07 


919 


3635.M18.gz43 517981 


AL591498 


Human DNA sequence from clone RP1 1- 
113L12 on chromosome 13, complete 
sequence [Homo sapiens] 


1.40E-05 


920 


3635.O01.gz43 517711 


AF081788 


Homo sapiens putative spliceosome 
associated protein mRNA, complete cds 


3.70E-30 


921 


3635.014.gz43 517919 


X72224 


S.cerevisiae genes HSS1, NPL4 and HSP 


6.00E-06 


922 


3635.P17.gz43 517968 


AF242307 


Euphorbia esula sucrose transport protein 
mRNA, complete cds 


2.90E-10 


923 


3635.P18.gz43 517984 


AF078780 


Caenorhabditis elegans cosmid C04F2, 
complete sequence 


1.74E-04 


924 


3638.A02.gz43 518097 


M17988 


Spiroplasma virus 4 (SpV4) replicative 
form, complete genome 


4.00E-06 


925 


3638.A24.gz43 518449 


AF064079 


Plasmodium gallmaceum endochitinase 
precursor, mRNA, complete cds 


1.60E-07 


926 


3638.F15.gz43 518310 


AJ297538 


Homo sapiens partial RARA gene, intron 2 


4.00E-06 


927 


3638.H07.gz43 518184 


AK026258 


Homo sapiens cDNA: FLJ22605 fis, clone 
HSI04743 


2.00E-06 


928 


3638.J09.gz43 518218 


U89651 


Homo sapiens matrix metalloproteinase 
MMPRasi-1 gene, promoter region 


8.10E-08 


929 


3638.K06.gz43 518171 


AL139329 


Human DNA sequence from clone RP 1 1 - 
228P1 on chromosome 6, complete 
sequence [Homo sapiens] 


4.40E-11 
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930 


3638.L10.gz43 518236 


D26532 


Mouse mRNA for transcription factor 
PEBP2aB2, complete cds 


2.00E-08 


931 


3638.N05.gz43 518158 


X62294 


B.taurus mRNA for adrenal angiotensin II 
type-1 receptor 


9.00E-06 


932 


3643.D21.gz43 518788 


U17010 


Allomyces macrogynus mitochondrion 
NADH dehydrogenase subunit 5 (nad5)> 
gene, complete cds 


1.80E-05 


933 


3643.E24.gz43 518837 


AL022342 


Human DNA sequence from clone RP1- 
29M10 on chromosome 20, complete 
sequence [Homo sapiens] 


6.70E-05 


934 


3643.F07.gz43 518566 


M73962 


Bovine pregnancy-associated glycoprotein 1 
mRNA, complete cds 


6.00E-06 


935 


3643.G20.gz43 518775 


AF191214 


Homo sapiens isovaleryl dehydrogenase 
(IVD) gene, exons 1-3 


1.00E-05 


936 


3643.G24.gz43 518839 


AK025682 


Homo sapiens cDNA: FLJ22029 fis, clone 
HEP08661 


6.00E-06 


937 


3643.H09.gz43 518600 


AK024381 


Homo sapiens cDNA FLJ143 19 fis, clone 
PLACE3000406 


1.70E-05 


938 


3643.I01.gz43 518473 


AF000306 


Brassica napus steroid sulfotransferase 2 
gene, complete cds 


3.00E-06 


939 


3643.I02.gz43 518489 


X58433 


B.subtillis cad gene for lysine decarboxylase 


2.30E-05 


940 


3643.I18.gz43 518745 


M14872 


Mouse GnRH-GAP gene encoding 
gonadotropin-releasing hormone and Gn- 
RH-associated peptide (GAP) 


4.00E-06 


941 


3643.I24.gz43 518841 


BC003813 


Mus mus cuius, cloneMGC:6139 
IMAGE:3487295, mRNA, complete cds 


2.30E-07 


942 


3643.K06.gz43 518555 


AL050124 


Homo sapiens mRNA; cDNA 
DKFZp586E151 (from clone 
DKFZp586E151) 


1.60E-07 


943 


3643.L01.gz43 518476 


AJ278429 


Mus musculus partial Prkarla gene for 
cAMP-dependent protein kinase regulatory 
subunit RIalpha, exons 8-10 and 3'UTR 


3.00E-06 


944 


3643.N24.gz43 518846 


BC006511 


Homo sapiens, clone IMAGE:3010441, 
mRNA 


1.00E-05 


945 


3643.016.gz43 518719 


AE002303 


Chlamydia muridarurn, section 34 of 85 of 
the complete genome 


1.10E-05 


946 


3643.018.gz43 518751 


V00248 


Drosophila gene for yolk protein I 
(vitellogenin) 


2.00E-06 


947 


3643.021.gz43 518799 


AE000614 


Helicobacter pylori 26695 section 92 of 134 
of the complete genome 


1.40E-05 


948 


3643.P13.gz43 518672 


Y17693 


Bungarus multicinctus gene encoding alpha- 
bungarotoxin, V31 variant 


2.00E-07 


949 


3643.P14.gz43 518688 


AF109352 


Euperipatoides rowelli microsatellite PI 8 
sequence 


8.80E-10 


950 


3646.A07.gz43 518945 


X55137 


H. giganteus type II resMction-modrfication 
system HgiBI 


3.00E-06 


951 


3646.A09.gz43 518977 


AF074963 


Rarhis norvegicus endothelin-B receptor 
(EDNRB) gene, partial cds 


2.10E-07 
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952 


3646.A12.gz43 519025 


AF176208 


Homo sapiens EcoRI-HindJJJ fragment 
upstream of exon 1 of the c-myc gene 


1.60E-05 


953 


3646.A13.gz43 519041 


X89445 


O.chalybea DNA for narB gene and partial 
ORFs 


4.00E-05 


954 


3646.B20.gz43 519154 


M86514 


Rat proline-rich protein mRNA, 3' end 


1.60E-05 


955 


3646.C06.gz43 518931 


Z71180 


Caenorhabditis elegans cosmid F22E12, 
complete sequence 


2.03E-04 


956 


3646.C16.gz43 519091 


U73608 


Hepatitis B virus, genome 7648 with G->A 
hyp ermut ations 


2.30E-05 


957 


3646.E02.gz43 518869 


U11683 


Trypanoplasma borreli Tt-JH mitochondrion 
cytochrome c oxidase subunit 1 (coxl) gene, 
complete cds 


8. 10E-07 


958 


3646.E20.gz43 519157 


AE006216 


PasteurellamultocidaPM70 section 183 of 
204 of the complete genome 


2.30E-05 


959 


3646.H04.gz43 518904 


AF043740 


Branchiostoma floridae amphioxus Otx 
transcription factor (Otx) mRNA, complete 
cds 


2.00E-06 


960 


3646.H09.gz43 518984 


AP000145 


Homo sapiens genomic DNA, chromosome 
21 q21 .2, LL56-APP region, clone 
B2291C14-R44F3, segment 10/10, complete 
sequence 


2.90E-40 


961 


3646.H16.gz43 519096 


U22342 


Bacteriophage T270 integrase (int) gene, 
complete cds 


1.00E-07 


962 


3646.10 l.gz43 518857 


X54486 


Human gene for CI -inhibitor 


6.80E-05 


963 


3646.J03.gz43 518890 


AB055372 


Macaca fascicularis brain cDNA, 
clone:QflA-12842 


5.40E-190 


964 


3646.J22.gz43 519194 


AL133032 


Homo sapiens mRNA; cDNA 
DKFZp586B03 17 (from clone 
DKFZp586B0317) 


2.00E-06 


965 


3646.K14.gz43 519067 


AF239178 


Paracoccidioides brasiliensis Ion proteinase 
gene, complete cds; nuclear gene for 
mitochondrial product 


4.00E-06 


966 


3646.L17.gz43 519116 


Z58907 


H. sapiens CpG island DNA genomic Msel 
fragment, clone 116a6, forward read 
cpgll6a6.ftla 


2.50E-70 


967 


3646.013.gz43 519055 


AL050391 


Homo sapiens mRNA; cDNA 
DKFZp586A181 (from clone 
DKFZp586A181); partial cds 


5.20E-08 


968 


3646.016.gz43 519103 


X00331 


Drosophila virilis simple DNA sequence 
(pDV-161) 


5.20E-08 


969 


3646.P09.gz43 518992 


U04527 


Borrelia burgdorferi 212 DNA gyraseb 
subunit (gyrB) and ribonuclease P protein 
component (rnpA) genes, partial cds, DnaA 
protein (dnaA), DNA polymerase III beta 
subunit (dnaN), and ribosomal protein L34 
(rpmH) genes, complete cds 


5.00E-06 


970 


3646.P14.gz43 519072 


AY032863 


Mus mus cuius chloride-formate exchanger 
mRNA, complete cds 


8.00E-06 


971 


3646.P17.gz43 519120 


U19569 


Human squamous cell carcinoma antigen 
(SCCA2) gene, exon 1 


1.20E-07 
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972 


3661.A08.gz43 519483 


AB017511 


Hydra magnipapillata mRNA for PLC- 
belaHl, complete cds 


1.20E-05 


973 


3661.D17.gz43 519630 


J03488 


Reovirus type 3 L2 gene encoding 
guanylyltransferase, complete cds 


3.00E-06 


974 


3661.D18.gz43 519646 


AB033024 


Homo sapiens mRNA for KIAA1198 
protein, partial cds 


1.90E-11 


975 


3661.E19.gz43 519663 


AB014084 


Homo sapiens genomic DNA, chromosome 
6p21.3, HLA class I region, Cosmid 
clone:TY7A5, complete sequence 


6.00E-05 


976 


3661.E23.gz43 519727 


AE001032 


Archaeoglobus fulgidus section 75 of 172 of 
the complete genome 


5.30E-05 


977 


3661.F14.gz43 519584 


X15063 


Plasmodium f alciparum mRNA for major 
merozoite surface antigen gpl95 


6.80E-05 


978 


3661.G16.gz43 519617 


AF255609 


Homo sapiens high mobility group protein 
HMG1 gene, exons 1 and 2, partial cds 


2.70E-07 


979 


3661.G20.gz43 519681 


AK021558 


Homo sapiens cDNA FLJ11496 fis, clone 
HEMBA1001964 


6.40E-09 


980 


3661.Hll.gz43 519538 


Z30705 


Puumala virus (Evo/15Cg/93) gene for N 
protein 


3.90E-07 


981 


3661.H24.gz43 519746 


X66979 


X.laevis mRNAXLFLI 


1.60E-05 


982 


3661.I22.gz43 519715 


AF029887 


Caenorhabditis degans UNC-129 (tinc-129) 
mRNA, complete cds 


5.00E-06 


983 


3661.J15.gz43 519604 


AJ297538 


Homo sapiens partial RARA gene, intron 2 


4.00E-06 


984 


3661.K22.gz43 519717 


AK002100 


Homo sapiens cDN A FL Jl 1 23 8 fis, clone 
PLACE1008532 


1.30E-13 


985 


3661.L19.gz43 519670 


AL589643 


Human DNA sequence from clone RP1 1- 
344C1 on chromosome 6, complete 
sequence [Homo sapiens] 


2.20E-05 


986 


3661.M03.gz43 519415 


Z57613 


H.sapiens CpG island DNA genomic Msel 
fragment, clone 187al2, forward read 
cpgl87al2.ftla 


1.20E-08 


987 


3661.M23.gz43 519735 


X79547 


Equus caballus mitochondrial DNA 
complete sequence 


5.80E-05 


988 


3661.P22.gz43 519722 


AF055668 


Mus museums apoptosis-linked gene 4, 
deltaC form (Alg-4) mRNA partial cds 


8.00E-06 


989 


3662.A13.gz43 519947 


Z49438 


S . cerevisiae chromosome X reading frame 
ORFYJL163c 


3.00E-06 


990 


3662.B13.gz43 519948 


AB045237 


Xenopus laevis XRPTPb mRNA for 
receptor-type protein tyrosine phosphatase 
beta. 11, complete cds 


7.00E-06 


991 


3662.C10.gz43 519901 


BC007905 


Homo sapiens, Similar to retinal 
degeneration B beta, clone MGC: 14375 
TMAGE:4299595, mRNA complete cds 


1.20E-09 


992 


3662.C15.gz43 519981 


M33864 


Human (cline HGL-3) interstitial retinoid- 
bindmerjrotein 3 (RBP3) gene, exon 1 


1.20E-05 


993 


3662.F13.gz43 519952 


AB040935 


Homo sapiens mRNA for KIAA1502 
protein, partial cds i 


1.20E-61 


994 


3662.H14.gz43 519970 


AB032757 


Mus museums gad65 gene for glutamate 
decarboxylase 65, partial cds 


8.00E-07 
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995 


3662.H23.gz43 520114 


AK013013 


Mus musculus 10, 11 days embryo cDNA, 
RIKEN full-length enriched library, 
clone:2810406L04, full insert sequence 


2.00E-06 


996 


3662.H24.gz43 520130 


D45371 


Human apMl mRNA for GS3 109 (novel 
adipose specific collagen-hke factor), 
complete cds 


9.60E-10 


997 


3662.J05.gz43 519828 


M83554 


Hsapiens lymphocyte activation antigen 
CD30 mRNA, complete cds 


1.40E-05 


998 


3662. JOS ,gz43 519876 


Z11876 


B.hernisii vmp7 gene encoding Vmp7 outer 
membrane lipoprotein 


1.11E-04 


999 


3662.J09.gz43 519892 


AB011101 


Homo sapiens mRNA for KIAA0529 
protein, partial cds 


6.30E-05 


1000 


3662.J16.gz43 520004 


U00484 


Anabaena PCC7 120 protein kinase PknA 
(pknA) gene, complete cds 


2.00E-06 


1001 


3662.K03.gz43 519797 


AL390145 


Homo sapiens mRNA; cDNA 
DKFZp762C115 (from clone 
DKFZp762C115) 


1.40E-05 


1002 


3662.L05.gz43 519830 


U63635 


Schizosaccharomyces pombe RNA lariat 
debranching enzyme (Sp-dbrl) gene, 
complete cds 


5.80E-10 


1003 


3662.N24.gz43 520136 


Z30709 


L.helveticus genes for prolinase and 
putative AB C transporter 


3.70E-05 


1004 


3662.O02.gz43 519785 


AF084460 


Gallus gallus potassium channel Shaker 
alpha subunit variant cKvl.4(m) mRNA, 
complete cds 


6.90E-05 


1005 


3662.P03.gz43 519802 


AJ011456 


Schinziella tetragona matK. gene 
(corresponding location in Tobacco: 963- 
1244) 


7.20E-08 


1006 


3663.A09.gz43 520267 


Z69608 


Arara SSU rRNA gene (partial) 


3.30E-07 


1007 


3663.C08.gz43 520253 


Z50756 


Caenorhabditis elegans cosmid T08D10, 
complete sequence 


7.60E-07 


1008 


3663.C19.gz43 520429 


Z22672 


H. sapiens cacnlla3 gene encoding skeletal 
muscle dlip-receptor alpha 1 subunit 


2.80E-07 


1009 


3663.E04.gz43 520191 


U89318 


Homo sapiens nucleophosmin 
phosphoprotein (NPM) gene, inrron 9, 
partial sequence 


2.60E-07 


1010 


3663.F15.gz43 520368 


U66073 


Tritrichomonas foetus putative superoxide 
dismutase 1 (SOD1) gene, complete cds 


9.20E-07 


1011 


3663.F22.gz43 520480 


U36786 


Rattus norvegicus putative pheromone 
receptor VN7 mRNA, complete cds 


7.10E-07 


1012 


3663.G01.gz43 520145 


AK024359 


Homo sapiens cDNAFLJ14297 fis, clone 
PLACE1008941 


9.50E-36 


1013 


3663.G08.gz43 520257 


L19339 


Molgula oculata zinc finger protein (manx) 
mRNA, complete cds 


5.20E-07 


1014 


3663.H20.gz43 520450 


X61307 


Staphylococcus aureus spa gene for protein 
A 


5.00E-06 
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1015 


3663.J06.gz43 520228 


AE007916 


Agrobactaiumtumefaciens strain C58 
plasmid AT, section 44 of 50 of the 
complete sequence 


2.02E-04 


1016 


3663.J16.gz43 520388 


U38181 


Leuconostoc niesenteroides dextransucrase 
gene, complete cds 


3.90E-07 


1017 


3663.K02.gz43 520165 


X683 3 9 


Mycoplasma-like organism (substrain 
ASHY) DNAfor 16S rRNA 


5.00E-06 


1018 


3663.K13.gz43 520341 


AF155221 


Mus museums matrix metalloproteinase 19 
(Mmpl9) mRNA, complete cds 


2.00E-06 


1019 


3663.L18.gz43 520422 


AB031056 


Solobacterium moorei gene for 16S rRNA, 
isolate:RCA59-74 


1.00E-06 


1020 


3663.L24.gz43 520518 


D10445 


Mouse mRNA for protein C, complete cds 


6.00E-06 


1021 


3663.lV124.gz43 520519 


AE001196 


Treponema pallidum section 12 of 87 of the 
complete genome 


5.20E-05 


1022 


3663.N09.gz43 520280 


AF081788 


Homo sapiens putative spliceosome 
associated protein mRNA, complete cds 


4.00E-20 


1023 


3663.N10.gz43 520296 


X62423 


P.falciparum pol delta gene for DNA 
polymerase delta 


4.00E-06 


1024 


3663.N12.gz43 520328 


AF178079 


Zygosaccharomyces rouxii ketoreductase 
(krd) mRNA, complete cds 


5.00E-06 


1025 


3663.N16.gz43 520392 


U41060 


Homo sapiens estrogen regulated LIV-1 
protein (LIV-1) mRNA, complete cds 


2.00E-06 


1026 


3663.O07.gz43 520249 


D00442 


Grapevine fanleaf virus satellite RNA 
(RNA3), complete cds 


1.50E-08 


1027 


3663.O09.gz43 520281 


AK002141 


Homo sapiens cDNAFLJ11279 fis, clone 
PLACE 1009444, highly similar to 
PHOSPHATBDYLINOSITOL 4-KTNASE 
ALPHA (EC 2.7.1.67) 


5.30E-10 


1028 


3664.All.gz43 520683 


U67525 


Methanococcus jannaschii section 67 of 150 
of the complete genome 


4.00E-06 


1029 


3664.C21.gz43 520845 


AF064773 


Staphylococcus aureus extracellular 
enterotoxintype G precursor (SEG) gene, 
complete cds 


1.30E-07 


1030 


3664.D06.gz43 520606 


AF 178079 


Zygosaccharomyces rouxii ketoreductase 
(krd) mRNA, complete cds 


5.00E-06 


1031 


3664.D12.gz43 520702 


U10519 


Human DNA polymerase beta gene, exon 5 


2.00E-07 


1032 


3664.D17.gz43 520782 


AK027226 


Homo sapiens cDNA: FLJ23573 fis, clone 
LNG12520 


4.90E-07 


1033 


3664.E18.gz43 520799 


AF3 17204 


Mus mus cuius C-type lectin superfamily 1 
gene, complete cds 


3.20E-05 


1034 


3664.E23.gz43 520879 


AB050903 


Mus museums mRNA for a4 subunit 
isoform, complete cds 


3.00E-06 


1035 


3664.E24.gz43 520895 


Z92793 


Caenorhabditis elegans cosmid H15M21, 
complete sequence 


1.20E-05 


1036 


3664.G12.gz43 520705 


AF211482 


Dictyostelium discoideum SdhA (sdhA) 
gene, complete cds 


2.30E-09 


1037 


3664.G20.gz43 520833 


• M14450 


Rat thyrotropin (TSH) bete-subunit gene, 
exons 2 and 3 


4.00E-06 


1038 


3664.H15.gz43 520754 


Y11270 


ElustolytLca INOl gene 


2.00E-06 
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1039 


3664.H22.gz43 520866 


X97773 


B.taurus mRNA for mitochondrial 
tricarboxylate carrier protein 


1.20E-05 


1040 


3664.J12.gz43 520708 


M58150 


Bovine lactoperoxidase (LPO) mRNA, 
complete cds 


3.20E-05 


1041 


3664.J23.gz43 520884 


U67463 


Methanococcus jannaschii section 5 of 150 
of the complete genome 


3.00E-06 


1042 


3664.Ki6.gz43 520773 


Z83118 


Caenorhabditis elegans cosmid M04D5, 
complete sequence 


2.70E-07 


1043 


3664.K19.gz43 520821 


U36927 


Plasmodium yoelii rhoptry protein gene, 
complete cds 


3.00E-05 


1044 


3664.L21.gz43 520854 


AF057695 


Haemophilus ducreyi strain 3 5000 putative 
phosphomannomutase (pmm.) gene, partial 
cds; large supernatant protein 1 (IspAl) 
gene, complete cds; and putative GMP 
synthase (guaA) gene, partial cds 


2.15E-04 


1045 


3664.022.gz43 520873 


U43574 


Hydra vulgaris nucleoporinp62 gene, 
complete cds 


7.00E-06 


1046 


3664.P12.gz43 520714 


AF030883 


Mus museums tRNA-His gene, complete 
sequence; platelet-activating factor 
acetylhydrolase lb alpha subunit (Pafaha- 
psl) pseudogene, complete sequence; and 
tRNA-Glu gene, complete sequence 


9.00E-06 


1047 


3664.P18.gz43 520810 


Z47735 


H.sapiens NFKB1 gene, exons 11 & 12 


1.32E-04 


1048 


3665.A23.gz43 521259 


X66979 


X.laevis mRNA XLFLI 


1.60E-05 


1049 


3665.B01.gz43 520908 


M90058 


Human serglycin gene, exons 1,2, and 3 


4.00E-06 


1050 


3665.B12.gz43 521084 


AK020877 


Mus museums adult retina cDNA, RJKEN 
full-length enriched library, 
clone:A930019H03, full insert sequence 


7.10E-07 


1051 


3665.Ell.gz43 521071 


AB024030 


Arabidopsis thaliana genomic DNA, 
chromosome 5. TAG clone:K5A21 


9.00E-06 


1052 


3665.E20.gz43 521215 


X76584 


H.sapiens simple DNA sequence region 
clone wglhl 


6.80E-08 


1053 


3665.H20.gz43 521218 


X95301 


D.rerio mRNA for HER-5 protein 


9.50E-07 


1054 


3665.K01.gz43 520917 


X04653 


Mouse mRNA for Ly-6 alloantigen (Ly- 
6E.1) 


1.30E-05 


1055 


3665.M01.gz43 520919 


AF098352 


Wiseana copularis haplotype southern 
cytoclixome oxidase subunit I and 
cytochrome oxidase subunit II genes, partial 
cds; mitochondrial genes for mitochondrial 
products 


5.80E-07 


1056 


3665.M21.gz43 521239 


AF257480 


Ranatemporaria micro satellite SB80 
sequence 


3.30E-09 


1057 


3665.M23.gz43 521271 


Y10623 


C.pallidivittatus globin gene cluster E 


1.10E-05 


1058 


3665.N24.gz43 521288 


X95301 


D.rerio mRNA for HER-5 protein 


1.00E-06 


1059 


3665.O06.gz43 521001 


AE007033 


Mycobacterium tuberculosis CDC1551, 
section 119 of 280 of the complete genome 


7.40E-05 


1060 


3665.014.gz43 521129 


AB033094 


Homo sapiens mRNA for KIAA1268 
protein, partial cds 


2.10E-08 
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1061 


3665.015.gz43 521145 


AK004557 


Mus museums adult male lung cDNA, 
RIKEN faU-lengtti enriched library, 
clone* 1200003C23 full insert sequence 


1 20B-05 


1062 


3665.019.gz43 521209 


AY036905 


Trichodenna atroviride protein GTPase 
Tgal (tgal) gene, complete cds 


2.10E-08 


1063 


3665.021.gz43 521241 


U89293 


Homo sapiens MSH4 (HMSH4) mRNA, 




1064 


3665.023.gz43 521273 


X00048 


Herpes simplex virus (HSV) type 2 
0.580 - 0.625) 


6.00E-06 


1065 


3665.P13.gz43 521114 


Z48796 


H. sapiens SM-W mRNA for helicase 


1.70E-05 


1066 


3666.A07.gz43 521387 


AK005546 


Mus musculus adult female placenta cDNA 
RIKEN full-length, enriched library, 
clone: 1600027G01, full insert sequence 


1.20E-07 


1067 


3666.A19.gz43 521579 


AB011101 


Homo sapiens mRNA for KIAA0529 
protein, partial cds 


5.80E-05 


1068 


3666.A24.gz43 521659 


AL050208 


Homo sapiens mRNA; cDNA 
DKFZp586F2323 (from clone 
DKFZp586F2323) 


2.90E-07 


1069 


3666.Bll.gz43 521452 


X06932 


Petunia hsp70 gene 


3.00E-06 


1070 


3666.C18.gz43 521565 


222672 


H.sapiens cacnlla3 gene encoding skeletal 


2 80E-07 


1071 


3666.D02.gz43 521310 


AJ297538 


Homo sapiens partial RARA gene, intron 2 


4.00E-06 


1072 


3666.Dll.gz43 521454 


AF057695 


Haemophilus ducreyi strain 35000 putative 
phosphomannomutase (pmm) gene, partial 
cds; large supernatant protein 1 (IspAl) 
gene, complete cds; and putative GMP 


2 43E-04 


1073 


3666.D15.gz43 521518 


Z66194 


H.sapiens CpG island DNA genomic Msel 
cpg80bl2.ftlb 


1.70E-66 


1074 


3666.D16.gz43 521534 


Z66194 


fragment, clone 80bl2, forward read 
cpg80bl2.ftlb 


2.10E-37 


1075 


3666.F22.gz43 521632 


U97062 


Staphylococcus aureus NCTC 8325 SecA 
(secA) gene, complete cds 


1.20E-08 


1076 


3666.G12.gz43 521473 


J03901 


Maize pyruvate, orthophosphate dikinase 
mRNA, complete cds 


1.72E-04 


1077 


3666.I12.gz43 521475 


AJ225102 


Pinus lambertiana chloroplast DNA 
containing a SSR Black Hills (Oregon) 


6.40E-10 


1078 


3666.L01.gz43 521302 


M86227 


Staphylococcus aureus DNA gyrase B 
subunit (gyrB) RecF homologue (recF) and 
DNA gyrase A subunit (gyrA) gene, 
complete cds 


5.00E-06 



WO 2004/039943 



PCT/LS2003/015465 



Table 8 



SEQ 
ID 


SEQ NAME 


ACCESSION 


GENBANK DESCRIPTION 


GENBANK 
SCORE 


1079 


3666.L06.gz43 521382 


AF224725 


Trichosurus vulpecula retrovirus TvERV 
(type D) gag polyprotein (gag), protease 
(pro), and pol polyprotein (pol) genes, 
complete cds 


3.30E-08 


1080 


3666.Lll.gz43 521462 


AF147081 


Homo sapiens gamma-glutamyl hydrolase 
gene, exons 1 and 2 


3.30E-05 


1081 


3666.L23.gz43 521654 


AK020701 


Mus museums 6 days neonate skin cDNA 
RKEN full-length enriched library, 
clone.A030009B12, full insert sequence 


2.20E-07 


1082 


3666.M16.gz43 521543 


AF158179 


Drosophila melanogaster strain Canton-S 
Chiffon-2 (chiffon) mRNA, alternative 
splice form 2, complete cds 


4.40E-07 


1083 


3666.N06.gz43 521384 


Z48796 


H.sapiens SM-W mRNA for helicase 


1.70E-05 


1084 


3667.A15.gz43 524557 


AF005903 


Monodelphis domestica GTP-binding 
protein homolog mRNA, partial cds 


7.80E-08 


1085 


3754.A08.gz43 532949 


AF091502 


mediating protein (aggH) gene, complete 
cds 


1.00E-06 


1086 


3754.A13.gz43 533029 


U02695 


satellite DNA sequence 


7.60E-07 


1087 


3754.A16.gz43 533077 


AE006577 


Streptococcus pyogenes Ml GAS strain 
SF370, section 106 of 167 of (lie complete 
genome 


9.00E-06 


1088 


3754.B04.gz43 532886 


S83995 


Pstl fragment [Chlamydia pneumoniae, 
Genomic, 474 nt] 


2.00E-06 


1089 


3754.B05.gz43 532902 


AY008833 


Staplrylococcus aureus tcaR-tcaA-tcaB 
opefon, complete sequences 


5.00E-06 


1090 


3754.B07.gz43 532934 


AF270216 


Staphylococcus epidermidis strain SRI 
clone step.l054hll genomic sequence 


9.50E-07 


1091 


3754.B08.gz43 532950 


AK007308 


Mus museums adult male testis cDNA, 
RKEN full-length enriched library, 
clone:1700128E15, full insert sequence 


7.00E-06 


1092 


3754.B10.gz43 532982 


AE002807 


Drosophila melanogaster genomic scaffold 
142000013385251, complete sequence 


5.40E-05 


1093 


3754.C22.gz43 533175 


D30612 


Homo sapiens mRNA for repressor protein, 
partial cds 


4.00E-06 


1094 


3754.D19.gz43_533128 


L12043 


Plasmodium falciparum unidentified mRNA 
sequence 


3.00E-06 


1095 


3754.E12.gz43 533017 


AB062933 


Macaca fascicularis brain cDNA 
clone:QccE-22249, full insert sequence 


3.60E-07 


1096 


3754.E20.gz43 533145 


AL138746 


Human DNA sequence from clone RP3- 
389B13 on chromosome Xq26.2-27.1, 
complete sequence [Homo sapiens] 


8.30E-10 


1097 


3754.F01.gz43 532842 


AF086820 


Drosophila melanogaster paired-like 
homeodomain protein UNC-4 (unc-4) 
mRNA, complete cds 


8.00E-06 


1098 


3754.F08.gz43 532954 


S66402 


vascular ATla angiotensin receptor {exon 
1, promoter} [rats, Sprague-Dawley,- 
Genomic, 3477 nt] 


3.10E-05 
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1099 


3754.Fll.gz43 533002 


X57377 


Mouse dilute myosin heavy chain gene for 
novel heavy chain with unique C4erminal 


2 10E-05 


1100 


3754.F15.gz43 533066 


AJ245620 


Homo sapiens CTLl gene 


2.50E-12 


1101 


3754.F20.gz43 533146 


AE002426 


Neisseria meningitidis serogroup B strain 
MC58 section 68 of 206 of the complete 
genome 


3.70E-05 


1102 


3754.G03.gz43 532875 


AF002166 


region sequence 


1.20E-07 


1103 


3754.G08.gz43 532955 


X71020 




6 80E-07 


1104 


3754.G18.gz43 533115 


AF126531 


Homo sapiens putative DNA-directed RNA 
polymerase III Cll subunit gene, complete 


1 10E 13 


1105 


3754.H08.gz43 532956 


L20127 


Rochalimaea henselae antigen (htrA) gene, 


4 60E-07 


1106 


3754.I01.gz43 532845 


AK022138 


Homo sapiens cDNA FLJ12076 fis, clone 
HEMBB1002442, weakly similar to LIN-10 
PROTEIN 


3.90E-14 


1107 


3754.103. gz43 532877 


AF016653 


Caenorhabditis elegans cosmid C41D7, 


2 00E-06 


1108 


3754.J01.gz43 532846 


U97408 


cTcnornabdrds^clegans cosmid F48A9 


4 00E-06 


1109 


3754.J05.gz43 532910 


Z35484 


Thennoanaerobacter sp. ATCC53627 cgtA 


4.00E-06 


1110 


3754.J10.gz43 532990 


D17094 


Human HepG2 partial cDNA, clone 
hmd5h04ni5 




1111 


3754.J12.gz43 533022 


Z56695 


H.sapiens CpG island DNA genomic Msel 
fragment, clone 13 6d4, reverse read 
cpgl36d4.rtla 


1.00E-06 


1112 


3754.J24.gz43 533214 






2 50E-05 


1113 


3754.K14.gz43 533055 


L79913 


Xenopus laevis rds/peripherin (rds35) 
mRNA, complete cds 


5.00E-06 


1114 


3754.K17.gz43 533103 


AE006251 


section 13 of 218 of the complete genome 


9.00E-06 


1115 


3754.K20.gz43 533151 


AB047880 


clone:QnpA-14303 


1.00E-06 


1116 


3754.M08.gz43 532961 


X58467 


Human CYP2D7 AP pseudogene for 
cytochrome P450 2D6 


4.30E-11 


1117 


3754.N16.gz43 533090 




polymerase suppressor alpha mutation gene 
(PSP2) complete cds 


1 80E-07 


1118 


3754.N19.gz43 533138 


AK0253 12 


Homo sapiens cDNA: FLJ21659 fis, clone 
COL08743 


1.40E-07 


1119 


3754.N22.gz43 533186 


AF081828 


Ixodes hexagonus mitochondrial DNA, 
complete genome 


4.00E-06 


1120 


3754.018.gz43 533123 


Z73229 


S.cerevisiae chromosome XII reading frame 
ORF YLR057w 


3.00E-06 


1121 


3754.023.gz43 533203 


AE006900 


Sulfolobus solfataricus section 259 of 272 of 
the complete genome 


1.10E-05 


1122 


3754.P13.gz43 533044 


AF220217 


Homo sapiens rsecl5 -like protein mRNA, 
partial cds 


1.80E-10 
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1123 


3754.P17.gz43 533108 


AJ250862 


Bacillus sp. HIL-Y85/54728 mersacidin 
biosynthesis gene cluster (mrsK2, mrsR2, 
mrsF, mrsG, mrsE, mrsA, mrsRl, mrsD, 
mrsM and mrsT genes) 


1.20E-05 


1124 


3756.A02.gz43 533237 


AF285594 


Homo sapiens testis protein TEX11 
(TEX1 1) mRNA, complete cds 


1.10E-05 


1125 


3756.All.gz43 533381 


U43148 


Human patched liomolog (PTC) mRNA, 
complete cds 


4.00E-06 


1126 


3756.A13.gz43 533413 


U56861 


Nicotianaplumbagiiiifolia intergenic region 
between lhcbl*l and lhcbl*2 genes 


1.00E-06 


1127 


3756.B03.gz43 533254 


AF101735 


Pan troglodytes isolate PTOR3A5P olfactory 
receptor pseudogene, complete sequence 


5.70E-08 


1128 


3756.B04.gz43 533270 


Z82038 


Cthermosaccharolyticum etfB, etfA, hbd, 
thlA and actA genes 


1.00E-06 


1129 


3756.B15.gz43 533446 


M96151 


Mus musculus apolipoprotein B gene 
sequence 


1.13E-04 


1130 


3756.B21.gz43 533542 


Z92793 


Caenorhabditis elegans cosmid H15M21, 
complete sequence 


1.30E-05 


1131 


3756.B22.gz43 533558 


U43542 


Nicotiana tabacum diphenol oxidase 
mRNA, complete cds 


2.00E-06 


1132 


3756.C06.gz43 533303 


AB022085 


Mus musculus Cctz-2 gene for chaperonin 
containing TCP-1 zeta-2 subunit, exon 5, 6, 
7, 8, 9, 10 


7.00E-05 


1133 


3756.C16.gz43 533463 


AF143236 


Homo sapiens apoptosis related protein APR 
2 mRNA, complete cds 


5.00E-06 


1134 


3756.D08.gz43 533336 


AB049544 


Porcine enterovirus 10 gene for RNA- 
dependent RNA polymerase, partial cds 


7.20E-07 


1135 


3756.D18.gz43 533496 


X53658 


E.coli DNA fragment 


7.60E-08 


1136 


3756.D24.gz43 533592 


X96861 


H.virescens mRNA for pheromone binding 
protein 


2.40E-07 


1137 


3756.E01.gz43 533225 


AF202892 


Mus musculus Kif21a (Kif21a) mRNA, 
complete cds 


4.00E-06 


1138 


3756.E06.gz43 533305 


AF 13 9374 


Homo sapiens DIR1 protein (DIR1) gene, 
complete cds 


8.00E-06 


1139 


3756.E12.gz43 533401 


AF238884 


Botrytis virus F, complete genome 


8.00E-06 


1140 


3756.E22.gz43 533561 


U78866 


Arabidopsis thaliana putative arginine- 
aspartate-rich RNA binding protein 
(genel500), (genelOOO), and (gene400) 
genes, complete cds 


5.00E-06 


1141 


3756.Fll.gz43 533386 


D50091 


Drosophila ezoana G-3-P dehydrogenase 
(alphaGpdli) gene, exonl-8, complete cds 


2.00E-06 


1142 


3756.F16.gz43 533466 


AJ233973 


Gallus gallus microsatelhte DNA GCT028 
(CA) repeat 


4.20E-07 


1143 


3756.G07.gz43 533323 


AE000708 


Aquifex aeolicus section 40 of 109 of the 
complete genome 


6.00E-05 


1144 


3756.G12.gz43- 533403 


M84731 


Pseudomonas sp. 5-substitutedhydantoin 
racemase (hyuE) gene, complete cds 


1.20E-05 
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Table 8 



SEQ 
ID 


SEQ NAME 


ACCESSION 


GENBANK DESCRIPTION 


GENBANK 
SCORE 


1145 


3756.G14.gz43 533435 


AL 116458 


Botrytis cinerea strain T4 cDNA library 
under conditions of nitrogen deprivation 


6.70E-07 


1146 


3756.I03.gz43 533261 


U67550 


Methanococcus jannaschii section 92 of 150 
of the complete genome 


2.30E-05 


1147 


3756.J05.gz43 533294 


U11292 


Human Ki nuclear autoantigen niRNA, 
complete cds 


7.70E-07 


1148 


3756.K03.gz43 533263 


AF073484 


Homo sapiens MHC class I-related protein 
MR1 precursor (MR1) gene, signal peptide 


8.00E-06 


1149 


3756.K07.gz43 533327 


M37499 


Human memylmalonyl CoA mutase (MUT) 
gene, exon 2 


2.00E-06 


1150 


3756.K15.gz43 533455 


AF248820 


Maoncicada campbelli isolate TB-MC-016 
tRNA-Asp gene, complete sequence; 
ATPase subunit 8 gene, complete cds; and 
ATPase subunit 6 gene, partial cds; 
mitochondrial genes for mitochondrial 
products 


7.30E-07 


1151 


3756.K18.gz43 533503 


M36300 


S.cerevisiae glutanine amidotransferase 
(TRP3) gene, 3* end 


2.30E-05 


1152 


3756.K20.gz43 533535 


AY022480 


Oryza sativa microsatelhte MRG4805 
containing (AGG)X8, genomic sequence 


2.00E-10 


1153 


3756.L02.gz43 533248 


X03833 


Human gene for interleukiii 1 alplia (IL-1 
alpha) 


2.80E-12 


1154 


3756.L03.gz43 533264 


AF244246 


Dysdera sp. MC cytochrome c oxidase I 
(COI) gene, partial cds; mitochondrial gene 
for mitochondrial product 


2.70E-07 


1155 


3756.L19.gz43 533520 


AJ002732 


Sclrizosaccharomyces pombe mRNA for 
ribosomal protein 114 


2.00E-06 


1156 


3756.M06.gz43 533313 


AK002951 


Mus musculus adult male brain cDNA, 
RIKEN full-length enriched library, 
clone:0710001E20, full insert sequence 


3.60E-07 


1157 


3756.M07.gz43 533329 


AF057708 


Populus balsamifera subsp. trichocarpa PTD 
protein (PTD) gene, complete cds 


2.60E-07 


1158 


3756.M20.gz43 533537 


Z35821 


S.cerevisiae chromosome II reading frame 
ORFYBL060W 


2.00E-06 


1159 


3756.N18.gz43 533506 


AL591667 


Human DNA sequence from clone RP1 1- 
389N9 on chromosome 6, complete 
sequence [Homo sapiens] 


6.10E-05 


1160 


3756.N21.gz43 533554 


AK026258 


Homo sapiens cDNA: FLJ22605 fis, clone 
HSI04743 


2.00E-06 


1161 


3756.O03.gz43 533267 


U61347 


Leiophyllumbuxifolium ribosomal maturase 
(matK) gene, chloroplast gene encoding 
chloroplast protein, complete cds 


4.20E-07 


1162 


3756.O07.gz43 533331 


AF177871 


Drosophila melanogaster small GTPase 
RHOl (Rhol) gene, alternatively spliced 
products and complete cds 


5.70E-07 


1163 


3756.O08.gz43 533347 


M60705 


Homo sapiens type I DNA topoisomerase 
gene, exons 19 and 20 


6.00E-06 


1164 


3756.P08.gz43 533348 


M60705 


Homo sapiens type I DNA topoisomerase 
gene, exons 19 and 20 


1.00E-05 
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Table 8 



SEQ 
ID 


SEQ NAME 


ACCESSION 


GENBANK DESCRIPTION 


GENBANK 
SCORE 


1165 


3759.C01.gz43 533607 


X71874 


H. sapiens genes for proteasome-like subunit 
(MECL-1), chymotrypsin-like protease 

HI) last exon 


4.00E-06 


1166 


3759.D15.gz43 533832 


AL356790 


Human DMA sequence trom clone Rfl 1- 
238J15 on chromosome 20 Contains ESTs 
and GSSs. Contains part of the TOM gene 
for a putative mitochondrial outer 
membrane protein import receptor similar 

Prpl/Zerl and Prp6, complete> 


1.10E-07 


1167 


3759.H08.gz43 533724 


M31684 


D.melanogaster cytoskeleton-like bicaudalD 


2 00E-06 


1168 


3759.H15.gz43 533836 


AB046001 


Macaca fascicularis brain cDNA, 
clone:QccE-12738 


2.60E-07 


1169 


3759.H17.gz43 533868 


AE000706 


Aquifex aeolicus section 38 of 109 of the 
complete genome 


1.30E-05 


1170 


3759.H23.gz43 533964 


AK027088 


Homo sapiens cDNA* PLJ23435 fis clone 
HRC12631 


6.20E-34 


1171 


3759.I05.gz43 533677 


AF056433 


oM^C^M 603 Cri-du-chat 


1 70E-07 


1172 


3759.I19.gz43 533901 


Z69666 


Human DNA sequence from cosmid 24F8 
from a contig from the tip of the short arm 
of chromosome 16, spanning 2Mb of 
16pl3.3. Contains ESTs, repeat 
polymorphism and CpG island 


2.06E-04 


1173 


3759.K05.gz43 533679 


L01432 


Soybean calmodulin (SCaM-3) iriRNA 
complete cds 


4.10E-08 


1174 


3759.K17.gz43 533871 


Z33340 ' 


M.capricolum DNA for CONTIG MC456 


4.00E-06 




3759 L02 gz43 533632 




Caenorhabditis elegans stomatin-like 
protein MEC-2 (mec-2) gene, complete cds 


3 70E-05 


1176 


3759.L09.gz43 533744 


Ml 1180 


Transposon Tn917 (complete), macrolide- 
resistance, complete cds 


1.50E-07 


1177 


3759.L10.gz43 533760 


AF1 17022 


sequence; chloroplast gene for chloroplast 
product 


4.40E-07 


1178 


3759.L15.gz43 533840 


U22657 


Mus musculus genomic locus related to 
cellular morphology 


1.60E-05 


1179 


3759.L24.gz43 533984 


AK022990 


Homo sapiens cDNAFLJ12928 fis, clone 
NT2RP2004767 


7.60E-10 


1180 


3759.M19.gz43 533905 


M96324 


Lycopersicon esculentum Ca2+-ATPase 
gene, complete cds 


2.50E-05 


1181 


3759.N08.gz43 533730 


AK005546 


Mus musculus adult female placenta cDNA, 
RIKEN full-length enriched library, 
clone:1600027G01, full insert sequence 


1.30E-07 
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Table 8 



SEQ 
ID 


SEQ NAME 


ACCESSION 


GENBANK DESCRIPTION 


GENBANK 
SCORE 


1182 


3759.N16.gz43 533858 


ABO 14079 


Homo sapiens genomic DNA, chromosome 
6p21.3, HLA class I region, Cosmid 
clone:TYlEll, complete sequence 


3.80E-12 


1183 


3759.N23.gz43 533970 


AKO 18377 


Mus museums 16 days embryo lung cDNA, 
RIKEN full-length enriched library, 
clone:8430403M08, full insert sequence 


5.70E-07 


1184 


3759.016.gz43 533859 


AE000918 


Memanobacterium tliermoautotropMcum 
from bases 1444576 to 1460617 (section 
124 of 148) of the complete genome 


1.40E-05 


1185 


3759.P03.gz43 533652 


L06066 


Saccharomyces cerevisiae PET1 17 
polypeptide (PET1 17) gene, complete cds 


5.90E-07 


1186 


3759.P13.gz43 533812 


X89414 


A.tlialiana DNA for pyrroline-5-cafboxylase 
synthetase gene 


5.00E-06 


1187 


3759.P15.gz43 533844 


X66979 


Xlaevis mRNA XLFLI 


1.50E-05 


1188 


3759.P17.gz43 533876 


AF039313 


Moraxella catarrhalis strain LES-1 
transferrin binding protein B (tbpB) gene, 
complete cds 


2.00E-06 


1189 


3762.A09.gz43 534117 


AE000496 


Escherichia coli K12 MG1655 section 386 
of 400 of the complete genome 


1.63E-04 


1190 


3762.Al6.gz43 534229 


X98371 


D.subobscura sex-lethal gene 


7.00E-06 


1191 


3762.Al9.gz43 534277 


U95019 


Human voltage-dependent calcium channel 
beta-2c subunit mRNA, complete cds 


6.10E-07 


1192 


3762.A20.gz43 534293 


M10014 


Homo sapiens map 4q28 fibrinogen (FGG) 
gene, alternative splice products, complete 
cds 


8.00E-06 


1193 


3762.B05.gz43 534054 


J05614 


Human proliferating cell nuclear antigen 
(PCNA) gene, promoter region 


1.40E-05 


1194 


3762.B15.gz43 534214 


AJ297559 


Homo sapiens partial PIK3CB gene for 
phosphatidylinositol 3 -kinase catalytic 
subunit pi lObeta, exons 15-17 


2.50E-05 


1195 


3762.C20.gz43 534295 


M58580 


Rabbit angiotensin-converting enzyme 
(ACE) gene, 5' end 


3.10E-05 


1196 


3762.C23.gz43 534343 


L27146 


Human neurofibromatosis 2 (NF2) gene, 
exon 16 


1.00E-06 


1197 


3762.D03.gz43 534024 


U51305 


Triticum aestivum alpha-gliadin storage 
protein pseudo gene, complete cds 


1.40E-05 


1198 


3762.D04.gz43 534040 


AF263274 


Chionodraco rastrospinosus isolate Cra7 
alpha tubulin mRNA, complete cds 


3.50E-07 


1199 


3762.D18.gz43 534264 


M94764 


Glycine max cv. Dare nodulin 26 gene 
fragment 


2.50E-05 


1200 


3762.D19.gz43 534280 


AE001446 


Helicobacter pylori, strain J99 section 7 of 
132 of the complete genome 


3.30E-05 


1201 


3762.D22.gz43 534328 


M73962 


Bovine pregnancy-associated glycoprotein 1 
mRNA, complete cds 


4.00E-06 


1202 


3762.E01.gz43 533993 


X63746 


S.cerevisiaerpc34 and fun34 genes for 
DNA dependant RNA polymerase c (III) 


4.00E-06 


1203 


3762.E10.gz43 534137 


Z74847 


S. cerevisiae chromosome XV reading frame 
ORFYOL105C 


1.00E-05 
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Table S 



SEQ 
ID 


SEQ NAME 


ACCESSION 


GENBANK DESCRIPTION 


GENBANK 
SCORE 


1204 


3762.E15.gz43 534217 


AF207841 


Pyricularia grisea AVR-Pita (AVR-Pita) 
gene, complete cds 


2.20E-09 


1205 


3762.E23.gz43 534345 


M58600 


Human heparin cofactor II (HCF2) gene, 
exons 1 through 5 


3.60E-37 


1205 


3762.F08.gz43 534106 


Z47066 


Human cosmid Qcl4G3 fromXq28 
contains STSs 


3.10E-09 


1207 


3762.F22.gz43 534330 


AY034974 


Arabidopsis thaliana unknown protein 
(F24J8.3) mRNA, complete cds 


4.20E-07 


1208 


3762.G18.gz43 534267 


Z28150 


S. cerevisiae chromosome XI reading frame 
ORFYKL150w 


2.00E-06 


1209 


3762.H12.gz43 534172 


AF370230 


Arabidopsis thaliana unknown protein 
(T21P5_16/AT3g03420)mRNA, complete 
cds 


6.60E-08 


1210 


3762J07.gz43 534093 


U19569 


Human squamous cell carcinoma antigen 
(SCCA2) gene, exon 1 


4.60E-07 


1211 


3762.J03.gz43 534030 


U22421 


Mus museums obesity protein (ob) gene, 
complete cds 


5.30E-07 


1212 


3762.J18.gz43 534270 


AB027966 


Schizosaccharomyces pombe gene for 
Hypothetical protein., partial cds, 
clone:TJ389 


2.30E-08 


1213 


3762.K02.gz43 534015 


AF273762 


Homo sapiens 3-hydroxy-3-methylglutaryl- 
coenzyme reductase gene, exon 15 


4.40E-14 


1214 


3762.K20.gz43 534303 


KOI 464 


Rat cardiac alpha-myosin heavy chain gene, 
5' flank, 1st 3 exons 


3.00E-06 


1215 


3762.L18.gz43 534272 


Z49438 


S. cerevisiae chromosome X reading frame 
ORFYJL163c 


4.00E-06 


1215 


3762.L20.gz43 534304 


XM 030040 


Homo sapiens similar to KIAA0877 protein 
(H. sapiens) (LOC90219), mRNA 


3.00E-06 


1217 


3762.M04.gz43 534049 


AF002237 


Anopheles gambiae clone 227 mRNA 
sequence 


4.00E-06 


1218 


3762.M17.gz43 534257 


M29688 


S.cerevisiae PMS1 gene encoding DNA 
mismatch repair protein, complete cds 


1.40E-08 


1219 


3762.M23.gz43 534353 


M20006 


Chicken tumor 10 c-myc DNA, exons 2 and 
3 


2.90E-09 


1220 


dul014734.coii 1 


AB027966 


Schizosaccharomyces pombe gene for 
Hypothetical protein, partial cds, 
clone:TB89 


3.00E-08 


1221 


Clul036845.con 1 


M34429 


Human PVT-IGLC fusion protein mRNA, 5 
end 


1.37E-03 
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END 


s 


is 1 




| 3 8 s 
































1 








1 S 2 ? 






































































SCORE 


1.8E-95 


1.8E-95 
1.8E-95 1 


1.8E-95 
1.8E-95 
1.8E-95 
1.8E-95 


1.8E-95 
1.8E-95 
1.8E-95 


L.8E-95 
i 1.8E-95 


1.8E-95 
1.8E-95 


1 5.8E-37 | 


5.8E-37 


1 1.2E-10 


| 1.2E-10 


7.2E-76 


7.2E-76 


7.2E-76 


7.2E-76 


7.2E-76 


7.2E-76 


7.2E-76 


4.6E-11 


1 4.8E-43 


PFAM DESCRIPTION 


Armadillo/beta-cateniii-Eke repeat 1 


Armadillo/beta-catenin-like repeat \ 
Armadillo/beta-catenin-like repeat | 


Armaclillo/beta-catenin-like repeat 
Armadillo/beta-catenin-iike repeat 
ArmamUo^eta-catenin-like repeat 
Armadillo/beta-catenin-like repeat 


Armadillo/beta-catenin-like repeat 
Armadillo/beta-catenin-like repeat 
Armadillo/beta-catenin-like repeat 


S 3 S3? 

! K £ 
ill: 

Ml. 

i g g 
3 1 1 . 
§■§■§■ 

III: 


Armadillo/beta-catenin-like repeat 
ArmadiUo^eta-catenin-like repeat 


| Importin beta binding domain | 


| Importin beta binding domain 


I Core histone H2A/H2B/H3/H4 i 


a* 

m 
X 

d 

I 
1 


|GTF2I-like repeat 


|GTF2I-like repeat 1 


|GTF2I-like repeat i 


|GTF2I-like repeat | 


|GTF2I-like repeat 


|GTF2I-like repeat 


|GTF2I-like repeat 


1 Glutathione S-transferase, N-terminal domain 


|CBS domain 


PFAM NAME 


Armadillo seg | 


iArmadillo_ seg 1 
lArmadillo seg 


S 1 1 1 


W) OJ) bj] 
£ S £ ? 

nr 






e 


m 

e 


1 histone 


1 histone 


|GTF2I 


IGTF2I 


IGTF2I 


IGTF2I 


IGTF2I 


GTF2I 


IGTF2I 


|GST N 








L.3 4 | 
1.3 4 


L.3 4 
L.3 4 

L.3 4 


L.3 4 
L.3 4 


L.3 4 
L.3 4 


L.3 4 
L.3 4 


L.3 4 


1.3 4 


.3 10 
















2 


























/ 




















a 


1 


NTP 00451 
NTP 00451 


NTP 00451 
NTP 00451 
NTP 00451 
NTP 00451 


NTP 00451 
NTP_00451 
1 NTP 00451 


NTP 00451 
NTP 00451 


NTP 00451 
NTP 00451 


NTP 00451 


NTP 00451 


NTP 00759 


NTP 00759 


1 


NTP 0078( 


1 


1 


NTP 0078( 






NTP 0088: 


1 NTP 0095! 








eo oo oo ao 


in 


* rf rf 






si 
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END | 
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x> 




< ^ 
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Ph 
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8 




3 8 


8 8 


o 0 c 


q oq 
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I 
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SEQNAME 


l 

i 


1 








§ © 




11 








i 








1 










1 


1 


fa 
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| END | 












2 










START 






















SCORE 


7.4E-61 1 


7.4E-61 


7.4E-61 J 


7.4E-61 | 


6.8E-09 1 


i 6.8E-09 1 


7.2E-76 | 


i 7.2E-76 | 


. 7.2E-76 | 


1 7.2E-76 | 


PFAM DESCRIPTION 


Cadherin domain j 


Cadherin domain j 


Cadherin domain j 


Cadherin domain I 


iHMG (liigli mobility group) box j 


i HMG (high mobility group) box ! 


!GTF2I-like repeat ! 


!GTF2I-like repeat j 


]GTF2I-like repeat j 


|GTF2I-like repeat j 


PFAM NAME 




\ cadherin 




cadherin 


|HMG box | 


|HMG box 


]GTF2I 


|GTF2I 


I 


|GTF2I 


SEQNAME 


1 NTP 011430S6.3 6 | 


i NTP 011430S6.3 6 | 


! NTP 011430S6.3 6 j 


i NTP 011430S6.3 6 i 


j NTP 017582S2.3 6 ! 


I NTP 017582S2.3 6 


j NTP 02633 ISl.l 1 1 


I NTP 026331S1.1 1 , 


| NTP 02633 ISl.l 1 


1 NTP 02633 ISl.l J. 


fa 


3 





















WO 2004/039943 



PCT/US2003/015465 



Comment 


invasive 
adenocarcinom 
a, moderately 
differentiated; 
focal perineural 
invasion is seen 


Hyperplastic 
polyp in 
appendix. 


Perineural 
invasion; donut 


Neg. One 
tubulovillous 
and one tubular 
adenoma with 
no high grade 
dysplasia. 


patient history 
of metastatic 
melanoma 




Dist 
Met 




o 


1 


1 


1 


Dist 
Met& 
Loc 




00 




bO 

Z 




Reg 
Lymph 
Grade 


2 


§ 






s 


Lymph 
Met 
Incid 








o 




Lymph 
Met 


o 

Ph 






I 




Local Invasion 


Extending into 
subserosal adipose 
tissue 


Invasion through 
muscularis 
propria, subserosal 
involvement; 
ileocec. valve 
involvement 


Invasion of 
muscularis propria 
into serosa, 
involving 
submucosa of 
urinary bladder 


Invasion through 
the muscularis 
propria into 
suserosal adipose 
tissue. Ileocecal 
junction. 


Invasion of 
muscularis propria 
into percolonic fat 


Histo 
Grade 


s 


s 


O 


s 


S 


Grade 


p ' 


P 


H 


p 


P 


1 




o 

o> 




v© 




Anatom 
Loc 


Ascending 
colon 


Cecum 

_ 


Sigmoid 


Cecum 


Transverse 
colon 


& 
O 








H 


s 


la 

Ph 






o 






PtID 






CN) 
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Comment 




Small separate 
tubular 
adenoma (0.4 
cm) 


Perineural 

invasion 

identified 

adjacent to 

metastatic 

adenocarcinom 

a. 


Separate 
tubolovillous 
and tubular 
adenomas 


Dist 
Met 
Grade 




o 




1 


Dist 
Met& 
Loc 


Neg 




Pos- 
Liver 


M 


Reg 
Lymph 
Grade 


g 


o 






Lymph 
Met 
Incid 


10/24 




7/21 


2/13 


Lymph 
Met 


o 


I 


Pos 


Pos 


Local Invasion 


through wall and 
into surrounding 
adipose tissue 


Invasion through 
muscularis propria 
into non- 
peritonealized 
pericolic tissue; 
gross 

configuration is 
annular. 


Invasion of 
muscularis propria 
into pericolonic 
adipose tissue, but 
not tlirough serosa. 
Arising from 
tubular adenoma. 


Invasion through 
mucsularis propria 
into 

subserosa/pericolic 
adipose, no serosal 
involvement. 
Gross 

configuration 
annular. 


Histo 
Grade 




O 


s 


8 


Grade 


P 


H 


p 


H 


Size 










Anatom 
Loc 


Splenic 
flexure 


Rectum 


Cecum 


Hepatic 
flexure 


Grp 




H 




s 


Is 

Ph 






o 




1 Ft ID 1 
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Comment 


Hyperplastic 
polyps 


Tubulovillous 
adenoma with 
high grade 
dysplasia 






Descending 
colon polyps, 
no HGD or 
carcinoma 
identified.. 


Dist 
Met 
Grade 


MX 


MO 


MX 


M0 


M0 


Dist 
Met& 
Loc 


Z 


Z 


1 Pos- 
Mesente 

ric 
deposit 


Z 


Neg 


Reg 
Lymph 
Grade 


z 


Z 


Z 


g 


8 


Lymph 
Met 
Incid 




0/10 


0/15 


0/12 


7/10 


Lymph 
Met 


Pos 




1 


Z 


Pos 


Local Invasion 


Invasion through 
musculaiis propria 
to involve 
subserosal, 
perirectoal 
adipose, and 
serosa 


Invasion through 
muscularis propria 
into subserosal 
adipose tissue. 


1 Invades through 
muscularis propria 
to involve 
pericolonic 
adipose, extends to 
serosa. 


Invades full 
thickness of 
muscularis 
propria, but 
mesenteric adipose 
free of malignancy 


Invasion into 
perirectal adipose 
tissue. 


Histo 
Grade 


o 


s 


O 


S 


8 


Grade 


H 


H 


S 


s 


H 


Size 






Os 






Anatom 
Loc 


Rectum 


Ascendhig 
colon 


Transverse 
colon 


Cecum 


Rectum 


1 Grp 












la 










> 


PtID 




-* 


<N 
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Comment 


Tubulovillous 
adenoma (2.0 
cm) with no 
high grade 
dysplasia. Neg. 
liver biopsy. 


1 hyperplastic 
polyp identified 






Two mucosal 
polyps ' 


Tumor arising 
at prior 
ileocolic 
surgical 
anastomosis. 


Dist 
Met 


1 


1 


MX 


1 


1 




Dist 
Met& 
Loc 


Neg 


bo 


to 


% 




Pos- 
Liver 


Reg 
Lymph 
Grade 








% 






Lymph 

, Met 
Incid 


2/12 




-* 
o 


o 






Lymph 
Met 


o 

Ph 




be 
J? 






Ph 


Local Invasion 


Invasion through j 


and invades 
pericolic adipose 
tissue. Ileocecal 
junction. 


Extends into 
perirectal fat but 1 
does not reach 
serosa 


Invasion through 
muscularis propria 
to involve 
pericolonic fat 
Arising from 
villous adenoma. 


Through colon 
wall into 

subserosal adipose 
tissue. No serosal 
spread seen. 


Invasion thru 
muscularis propria 
to pericolonic fat 


Invasion through 
muscularis propria 
into subserosal 
adipose tissue, not 
serosa. 


Histo 
Grade 


S 


S 


O 


a 


s 


S 


Grade 

| 


H 


H 


H 


H 


H 


H 


Size 






"ill 








Anatom 
Loc 


Cecum 


1 
o 

o 


Ascending 
colon 


Sigmoid 


Ascending 
colon 


Ascending 
colon 


Grp 


H 


H 






H 










O 






3 


a 


o5 




rn 
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Comment 




rediagnosis of 
oophorectomy 
path to 
metastatic 
colon cancer. 


Anatomical 
location of 
primary not 
notated in 
report. 
Evidence of 
chronic colitis. 


No mention of 
distant met in 
report 


Dist 
Met 
Grade 


1 






1 


Dist 
Met& 
Loc 




Pos- 
Liver 


Pos- 
Liver 


I 


Reg 
Lymph 
Grade 






2 




Lymph 
Met 
Incid 










Lymph 
Met 








o 

Ph 


Local Invasion 


Cecum, invades 
through 

muscularis propria 
to involve 
subserosal adipose 
tissue but not 
serosa. 


Invasive through 
muscularis to 
involve periserosal 
fat; abutting 
ileocecal junction. 


Invasion through 
muscularis propria 
involving pericolic 
adipose, serosal 
surface uninvolved 


penetrates 
muscularis 
propria, involves 
pericolonic fat. 


Histo 
Grade 


$ 


S 


O 


O 


Grade 






P 


H 












Anatom 
Loc 


Cecum 


Cecum 




Sigmoid 


O 










|a 










PtID 
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Comment 


Omentum with 
fibrosis and fat 
necrosis. Small 
bowel with 
acute and 
chronic 
serositis, focal 
abscess and 
adhesions. 




Appendix 
dilated and 
fibrotic, but not 
involved by 
tumor 


Dist 
Met 


1 




1 


Dist 
Met& 
Loc 


I 


Pos- 
Liver 




Reg 
Lymph 
Grade 


o 


S 


1 


Lymph 
Met 
Incid 






© 


Lymph 
Met 


IP 


1 




Local Invasion 


Invasion through 
the muscularis 
propria involving 
pericolic fat. 
Serosa free of 
tumor. 


Invasion through 
muscularis propria 
extensively 
through 

submucosal and 
extending to 
serosa. 


Invasion through 
the bowel wall, 
into suberosal 
adipose. Serosal 
surface free of 
tumor. 


Histo 
Grade 


s 


s 


B 


Grade 


p 


p 


P 


33 








Anatom 
Loc 


Ascending 
colon 


Ascending 
colon 


Cecum 


5 


H 




H 


la 

Ph 






vo 


1 PtED 1 
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Comment 


moderately 

differentiated 

adenocarcinom 

a with 

mucinous 

diferentiation 

(% not stated), 

tubular 

adenoma and 

hyperplstic 

polyps present, 


invasive poorly 
differentiated 
adenosquamous 
carcinoma 


moderately 

differentiated 

invasive 

adenocarcinom 

a 


Peritumoral 
lymphocytic 
response; 5 LN 
examined in 
pericolic fat, no 
metastatases 
observed. 


Three fungating I 


lesions 
examined. 


Dist 
Met 












Dist 
Met& 
Loc 


53 


Pos- 
Liver 


Pos- 
Liver 


60 


Pos- 
Liver 


Reg 
Lymph 
Grade 


o 






g 




Lymph 
Met 
Incid 






o 




o 


Lymph 
Met 




o 


!z 


Z 


fS 


Local Invasion 


extending through 
bowel wall into 
serosal fat 


through 

muscularis propria 
into pericolic soft 
tissues 


through 

muscularis propria 
into pericolic fat 
but not at serosal 
surface 


Invasion of 
muscularis propria 
into soft tissue 


Extending througli I 


muscularis propria 
into pericolonic fat 


Histo 
Grade 


B 


s 


B 


G2-G3 


G2-G3 


Grade 


P 


p 


P 


P 


P 


1 


o 










Anatom 
Loc 


Cecum 


] Ascending 
colon 


J Descendin 
g colon 


Rectosigin 
oid 


Cecum j 




& 
O 








a 




|e 

Ph 






© 


o 


© 


Ft II) 




oo 
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Comment 




poorly 

differentiated 

invasive 

colonic 

adenocarcinom 
a 


well to 

moderately 

differentiated 


as; this patient 
has tumors of 
the ascending 
colon and the 
sigmoid colon 


moderately 
differentiated 
adenocarcinom 
a 




Perineural 

invasion 

present. 


Dist 
Met 














Dist 
Met& 
Loc 


Pos- 
Liver 


Pos- 
Liver 


Pos- 
Liver 


Pos- 
Liver 


Pos- 
Liver 


Pos- 
Liver 


Reg 
Lymph 
Grade 








2 






Lymph 
Met 
Incid 




13/25 






11/15 




Lymph 
Met 


Ph 






1 


Ph 


Ph 


Local Invasion 


Invading through 
muscularis propria 
into perirectal fat 


Through the 
muscularis propria 
into pericolic fat 


Into muscularis 
propria 


Through 

muscularis propria 
int subserosal 
tissue 


Through 

muscularis propria 
into subserosa. 


Invasion through 
muscularis propria 


tissue 


Histo 
Grade 


G1-G2 


8 


3 


<^ 
O 


$ 


B 


Grade 


H 


H 




H 


H 


P 


GO | 


vd 












Anatom 
Loc 


Rectum 


Ascending 
colon 


Ascending 
colon 


Cecum 


Ascending 
colon 


Rectam 


(3 


& 












Ph 


o\ 
o 




o 


© 






Q 








So 
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Comment 


Perineural 

invasion 

present, 

extensive. 

Patient with a 

history of colon 

cancer. 


Perineural 
invasion focally 
present. 
Omentum 
mass, but 
resection with 
no tumor 
identified. 


Primary 
adenocarcinom 
a arising from 
tubulovillous 
adenoma. 


Dist 
Met 








Dist 
Met& 
Loc 


Pos- 
Liver, 
left and 
right 
lobe, 
omentu 


Pos- 
Liver 


Pos- 
Liver 


Reg 
Lymph 
Grade 




Z 




Lymph 
Met 
Incid 


1/28 


14/17 




Lymph 
Met 


Pos 


Pos 


Pos 


Local Invasion 


Invasion into 
pericolic sort 
tissue. Tumor 
focally invading 
skeletal muscle 
attached to colon. 


Through 

muscularis propria 
into pericolic fat 


Invasion through 
colon wall and 
focally involving 
subserosal tissue. 


Histo 
Grade 


O 


G2-G3 


S 


Grade 


H 


H 


H 


Size 


© 


© 


© 


Anatom 
Loc 


1 


Transverse 
colon 


Sigmoid 


Grp 








la 






1009 


PtID 






00 
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COLON 


1 RATIOS | 




























LZ 




11 




33.33333 | 




60.71429 


60.71429 1 




1 35.71429j 








! 42.9 | 


1 33.3 ! 


44.4 | 


55.6 ! 


1 55.6 1 


COLON 


1 RATIOS 1 
































COLON 
PATIENTS 

>=2x 




| 41.025641 | 


1 37.5 1 






1 57.5 | 


35.29411761 




1 63.4146341 ■ 






| 46.2 


1 48.7 ! 


| 61.5 


1 61.5 


BREAST 


RATIOS 1 
































BREAST 


if 

t 
















I 47.0588235 
















CLONE ID 


| M00084443A:E10 [ 


1 M00084700A:C10 J 


| M00085031B:E03 | 


| M00085171D:F05 | 


| M00085222D:D07j 


| M00086277B-.E06 | 


| M00085835B:E11 I 


u 
1 


| M00085100B.-C12 ; 














SEQNAME 


3544.G06.GZ43 505397 1 


3559.B18.GZ43 507504 , 


I 3590.D19.GZ43 512389 


1 3596JP03.GZ43 512529 


1 3599.K02.GZ43 512892 


' 3665.O06.gz43 521001 


| 3756.K15.gz43 533455 


| 3756.M06.gz43 533313 


I 3759.P15.gz43 533844 


I NT 007592S2.3 10 


NT 009296S1.3 1 










I NT 009296S1. 


i NT 009296S1 


NT 017582S2 


| NT 017582S2. 


a 

I 


JO 








1 


© 
















1 


1 
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Table 15 



Library ID 


CMCC Number 


Cloneld 


NRRL Number 


ES219 


5471 


M00084879B:E01 




ES219 


5471 


M00083819B:E10 


B 30523 


ES219 


5471 


M00084942C:B10 


B 30523 


ES219 


5471 


M00084704C:B09 


B 30523 


ES219 


5471 


M00084887C:C07 


B-30523 


ES219 


5471 


M00084976B:A08 


B-30523 


ES219 


5471 


M00085011B:A01 


B-30523 


ES219 


5471 


M00084961A:C07 




ES219 


5471 


M00084960D:D02 


B 30523 


ES219 


5471 


M00084973A:B06 


B "30523 


ES219 


5471 


M00084928D:F06 


6*30523 


ES219 


5471 


M00084968C:D10 


B "30523 


ES219 


5471 


MO 0 0 84 973 A:B06 


B~30523 


ES219 


5471 


M00084966AA08 


B 30523 


ES219 


5471 


M00084919C:B04 


B-30523 


ES219 


5471 


M00085003C:D03 




ES219 


5471 


M00084968A:D01 


B-30523 


ES219 


5471 


M00084969D:C11 




ES219 


5471 


M00084899D:B01 


b"30523 


ES219 




M00084893C.A12 


B~30523 


ES219 


5471 


M00084890D:F09 


B "30523 


ES219 


5471 


M00084904A:D03 


B 30523 


ES219 


5471 


M00085029A:C02 


B-30523 


ES219 


5471 


M00084963D:D07 


B-30523 


ES219 


5471 


M00085147C:A04 




ES219 


5471 


M00085144B:C12 


5*30523 


ES219 


5471 


M00085124B:G05 


B-30523 


ES219 


5471 


M00085702B:G11 


B 30523 


ES219 


5471 


M00085203A:E06 


B-30523 


ES219 


5471 


M00085242A:C06 




ES219 


5471 


M00084980D:H08 


B 30523 


ES219 


5471 


M00085187B:C11 


B-30523 


ES219 


5471 


M00085021C:F06 


B-30523 


ES219 


5471 


M00085182B:E04 




ES219 


5471 


M00084930D:B08 


B 30523 


ES219 


5471 


M00084941B:E07 


B 30523 


ES219 




M00084424D:G07 


B 30523 


ES219 


5?7~I 


M00084938B:F12 


B-30523 


ES219 


5471 


M00084853D:G03 




ES219 


5471 


M00084878B:B12 


B 30523 


ES219 


5471 


M00084889B:C02 


B 30523 






M00084885DA12 


B 30523 


ES219 


54Tl 


M00084845A:E02 


6*30523 


ES219 


5471 


M00084972B:H03 


B 30523 


ES219 


5471 


M00084908A:F03 


B-30523 


ES219 


5471 


M00084975A:G05 


B-30523 


ES219 


5471 


M00084941C:H04 


B-30523 


ES219 


5471 


M00084997D:H09 


B-30523 


ES219 


5471 


M00084491AE08 


B-30523 


ES219 


5471 


M00083815C:H08 


B-30523 


ES219 


5471 


M00084501A:D06 


B-30523 


ES219 


5471 


M00084558D:G08 


B-30523 


ES219 


5471 


M00084510C:F02 


B-30523 
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Table 15 



Library ID 


CMCC Number 


Cloneld 


NRRL Number 


ES219 


5471 


M00084521C:H11 


B-30523 


ES219 


5471 


M00084446A:A05 


B-30523 


ES219 


5471 


M00084458A:G06 


B-30523 


ES219 


5471 


M00084377D:E08 


B-30523 


ES219 


5471 


M000843 82A:D06 


B-30523 


ES219 


5471 


M00083816B.-D08 


B-30523 


ES219 


5471 


M00084449B-.C09 


B-30523 


ES219 


5471 


M00084431C:B02 


B-30523 


ES219 


5471 


M00084463A:B07 


B-30523 


ES219 


5471 


M00084487D:F04 


B-30523 


ES219 


5471 


M00083800C:E07 


B-30523 


ES219 


5471 


M00084468C:E07 


B-30523 


ES219 


5471 


M00084638A:E10 


B-30523 


ES219 


5471 


M00084439B-.A08 


B-30523 


ES219 


5471 


M00084479D:E10 


B-30523 


ES219 


5471 


M00084455D-.B03 


B-30523 


ES219 


5471 


M00084368D.C02 


B-30523 


ES219 


5471 


M00084642D:E08 


B-30523 


ES219 


5471 


M00084373A:F08 


B-30523 


ES219 


5471 


M00084364C:B06 


B-30523 


ES219 


5471 


M00084521B:E11 


B-30523 


ES219 


5471 


M00084385B:D03 


B-30523 


ES219 


5471 


M00084443C:H06 


B-30523 


ES219 


5471 


M00083803C:F03 


B-30523 


ES219 


5471 


M00084421C-.B11 


B-30523 


ES219 


5471 


M00084434B:E06 


B-30523 


ES219 


5471 


M00083820B:C03 


B-30523 


ES219 


5471 


M00084246B:H03 


B-30523 


ES219 


5471 


M00084484C:B11 


B-30523 


ES219 


5471 


M00084410C:F10 


B-30523 


ES219 


5471 


M00083801B.-H03 


B-30523 


ES219 


5471 


M00084980C:B07 


B-30523 


ES219 


5471 


M00084499C:C11 


B-30523 


ES219 


5471 


M00084526C:G09 


B-30523 


ES219 


5471 


M00084406C:A01 


B-30523 


ES219 




M00084380D:B07 


B-30523 


ES219 


5471 


M00084383B:A11 


B-30523 




5471 


M00083834C:E02 


B-30523 


ES219 


5471 


M00083839A.-H03 


B-30523 


ES219 


5471 


M00084505C:H08 


B-30523 






M00084511D:A02 


B-30523 


ES219 


5471 


M00084494C:C01 


B-30523 


ES219 


5471 


M00084451D:F06 


B-30523 


ES219 


5471 


M00084604A:D02 


B-30523 


ES219 


5471 


M00084771D.-G03 


B-30523 


ES219 


5471 


M00084817A:H11 


B-30523 


ES219 


5471 


M00084827D.-D04 


B-30523 


ES219 


5471 


M00084843D:C06 


B-30523 


ES219 


5471 


M00084750C:B08 


B-30523 


ES219 


5471 


M00084757A-.D01 


B-30523 


ES219 


5471 


M00084771D:A01 


B-30523 


ES219 


5471 


M00084730B:A09 


B-30523 


ES219 


5471 


M00084826B:E11 


B-30523 
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Table 15 



Library ID 


CMCC Number 


Cloneld 


NRRL Number 


ES219 


5471 


M00084595C:C07 


B-30523 


ES219 


5471 


M00084724A.-C02 


B-30523 


ES219 


5471 


M00084833A.-G07 


B-30523 


ES219 


5471 


M00084600D:B10 


B-30523 


ES219 


5471 


M00084634C:H02 


B-30523 


ES219 


5471 


M00084614D.-A08 


B-30523 


ES219 


5471 


M00084620B:F05 - 


B-30523 


ES219 


5471 


M00084607A:B03 


B-30523 


ES219 


5471 


M00084633A:B12 


B-30523 


ES219 


5471 j 


M00084597A:F06 


B-30523 


ES219 


5471 


M00084575AA11 


B-30523 


ES219 


5471 


M00084547B:B10 


B-30523 


ES219 


5471 


M00084525A:E08 


B-30523 


ES219 


5471 


M00084578B:E12 


B-30523 


ES219 


5471 


M00084669AA05 


B-30523 


ES219 


5471 


M00084419CA09 


B-30523 


ES219 


5471 


M00084769C.-H03 


B-30523 


ES219 


5471 


M00085007A:B03 


B-30523 


ES219 


5471 


M00084865D:B04 


B-30523 


ES219 


5471 


M00084743D:G01 


B-30523 


ES219 


5471 


M00084770B:G12 


B-30523 


ES219 


5471 


M00084584BA02 


B-30523 


ES219 


5471 


M00084647C-.E12 


B-30523 


ES219 


5471 


M00084766D:F12 


B-30523 


ES219 


5471 


M00084648D-.F05 


B-30523 


ES219 


5471 


M00084843A-.D06 


B-30523 


ES219 


5471 


M00084709C:B02 


B-30523 


ES219 


5471 


M00084834B-.G02 


B-30523 


ES219 


5471 


M00084718D.C04 


B-30523 


ES219 


5471 


M00084702B:C12 


B-30523 


ES219 


5471 


M00084645D:G02 


B-30523 


ES219 


5471 


M00084849B:F11 


B-30523 


ES219 


5471 


M00084859C:H05 


B-30523 


ES219 


5471 


M00084850D:H02 


B-30523 


ES219 


5471 


M00084857BA09 


B-30523 


ES219 


5471 


M00084867A:C11 


B-30523 


ES219 


5471 


M00084823A:H01 


B-30523 


ES219 


5471 


M00084756B-.H01 


B-30523 


ES219 


5471 


M000S4700D:E09 


B-30523 


ES219 


5471 


M00085010C:H01 


B-30523 


ES219 


5471 


M00085060B:C05 


B-30523 




5471 


M00085012C:A08 


B-30523 


ES219 


5471 


M00085047D:F03 


B-30523 


ES219 


5471 


M00085049B:E03 


B-30523 


ES219 


5471 


M00085051C:A01 


B-30523 


ES219 


5471 


M00085050A:E11 


B-30523 


ES219 


5471 


M00085676C-.C04 


B-30523 


ES219 


5471 


M00085121A:D10 


B-30523 


ES219 


5471 


M00085166D-.C10 


B-30523 


ES219 


5471 


M00084992D:B02 


B-30523 


ES219 


5471 


M00085148B:H01 


B-30523 


ES219 


5471 


M00085123B:C04 


B-30523 j 


ES219 


5471 


M00085173BA08 


B-30523 
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Library ID 


CMCC Number 


Cloneld 


NRRL Number 






NiWKJQJ 1 /ZL.rUo 


ntn<;7? 


ES219 


5471 


M00084937D'B04 








M00085026D A01 


p"Lc^ 


ES219 


5471 


M00084994D'F1 1 


P.\n5T3 


ES219 


5471 


M00085190C'D10 


R^ffl 


ES219 


5471 


M00085 194D'F04 


P.\n593 


ES219 


5471 


M00085222D'D07 


r>\nS9^ 

ptn 


ES219 


5471 


IViUUUo JZZ J A. OUl 


\[\ 7 


ES219 


5471 


lVJLUUUo4 /4UL/..E5UO 


!" ii 


ES219 


5471 


JVllHJUOoUJOA.VJ 1Z 


p"on^9^ 


ES219 


5471 


AvTTlf»flQA/^71 A -PI 0 
IV1UUU04O/ lA.Ulz 




ES219 


5471 


TV/rflf)f)8A57 1 P-'nfK 
lWUUUo4o / IL/.iyUj 


p\ns9^ 


ES219 


5471 


1V1UUU64 J5 iD.rVJ 1 


r\o5T? 


FS919 


siTi 


lviUUU04j OZL/.xlUJ 




ES219 


rpn 


1WUUU640 1 iSO.Alo 


p\rK7? 


ES219 


5471 


M00084687A'A03 


p\r)593 


ES219 


5471 


M0008503 8A*C06 


-r\o^ 


ES219 


5471 


M00084722A*H12 


p\ f) 'm 


ES219 


5471 


M00084676D'E()2 


R-ifw^ 


ES219 


i 


M00084615D"H12 


r\o59^ 


ES219 


5471 


M00084659C'G05 


p.\rt59^ 


FS919 


S471 


M00084536B"A03 


R^f^^ 


ES219 


5471 


A/fnnnR/iQ7QP'nn7 
iviuuuo4yzyu.-t>uz 


n'lnsr? 


ES219 


5471 


M00084652D:G11 


B 3052" 


ES219 


5471 


M00084611B:A11 


p\n'S23 


ES219 


5471 


M00084530D'G07 


p. "30593 


ES219 


5471 


M00084527C'H07 


R "qnS9q 


ES219 


5471 


M00084545C"C05 


n'wr* 


ES219 


5471 


M00084535D'C12 


p.^059^ 


ES219 


5471 


M00084684C'D02 


ft\r>59- 


F991Q 


5471 


M00084679D'G12 


p.^ri';?^ 


F g 91 o 


— r- 


i\/rnnr»fi/i7id. a -170/1 
JViUUUo4 / ,?4A.i}U4 


!'ir 

p^n^ 


FS91Q 


5471 


lViUUUo4oyolJ.riU4 




FS990 


5479 


MUllUo4 /Z4L7.rU4 


B30524 


ES220 


«T9 


A,rnfins/i55cm -pin 


B-30524 


F^990 


rrrr 


ivj.uuuo4 i v i u.tt\)j 


p" 3 n 52 l 


btoo 


— r 


1iAC\[\f\SLA <i 5tvxja 1 





ES220 


sir? 


M00084578C"G09 




ES220 


5472 




RlfKTA 


ES220 


5479 


1VJlUUI'o4j J /.D.l^lO 


P.\n594 


ES220 


5479 


A/fnrinfi/i 5^ri a -^ns 


p\n^94 




— — 

5_L 


JVlUUUoo IZVrJ.CUZ 


B 30524 


Fooon 




lviUUUo40zU.LJ.ii ID 


B-30524 


ES220 


5472 


MUUUo4j /OA. IMZ 




ES220 


5472 


M00084720A:A01 


6*30524 


ES220 


5472 


M00084654A:E04 


B-30524 


ES220 


5472 


M00084596D:E10 


B-30524 


ES220 


5472 


M00084646A:D02 


B-30524 


ES220 


5472 


M00084572D.-F07 


B-30524 


ES220 


5472 


M00084620A:E08 


B-30524 


ES220 


5472 


M00084553B:F04 


B-30524 


ES220 


5472 


M00084614D:B07 


B-30524 


ES220 


5472 


M00084604D:D08 


B-30524 
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Table 15 



Library ID 


CMCC Number 


Cloneld 


NRRL Number 


ES220 


5472 


M00084722D:A03 


B-30524 


ES220 


5472 


M00084958C:B03 


B-30524 


ES220 


5472 


M00084523C:A05 


B-30524 


ES220 


5472 


M00085166C:A08 


B-30524 


ES220 


5472 


M00084467A:D06 


B-30524 


ES220 


5472 


M00084890CA06 


B-30524 


ES220 


5472 


M00084609C:F10 


B-30524 


ES220 


5472 


M00084413CA11 


B-30524 


ES220 


5472 


M00084834A.A03 


B-30524 


ES220 


5472 


M00085172A:G05 


B-30524 


ES220 


5472 


M00085146B:C01 


B-30524 


ES220 


5472 


M00085038A:B10 


B-30524 


ES220 


5472 


M00084246A:D03 


B-30524 


ES220 


5472 


M00084967B:D09 


B-30524 


ES220 


5472 


M00085035D:E04 


B-30524 


ES220 


5472 


M00084736B-.H03 


B-30524 


ES220 


5472 


M00085025A-.D11 


B-30524 


ES220 


5472 


M00084900C:A04 


B-30524 


ES220 


5472 


M00085127C:C03 


B-30524 


ES220 


5472 


M00084424A:G07 


B-30524 


ES220 


5472 


M00085131D:A06 


B-30524 


ES220 


5472 


M00084987B:H12 


B-30524 


ES220 


5472 


M000S4967C:D10 


B-30524 


ES220 


5472 


M00084420A-.G02 


B-30524 


ES220 


5472 


M00084452B:F07 


B-30524 


ES220 


5472 


M000S4705C:D01 


B-30524 , 


ES220 


5472 


M00085156A.-G04 


B-30524 


ES220 


5472 


M00084447D:F03 


B-30524 


ES220 


5472 


M00084495B:CI1 


B-30524 


ES220 


5472 


M00084745A-.A08 


B-30524 


ES220 


5472 


M00084458B:G05 


B-30524 


ES220 


5472 


M00084449C:C01 


B-30524 


ES220 


5472 


M00084867B:A03 


B-30524 


ES220 


5472 


M00084680A:F08 


B-30524 


ES220 


5472 


M00084585B:D06 


B-30524 


ES220 


5472 


M00084835D:H03 


B-30524 


ES220 


5472 


M00084685C:B12 


B-30524 


ES220 


5472 


M00084500C:D01 


B-30524 


ES220 


5472 


M00084469A:C09 


B-30524 


ES220 


5472 


M00084381C:A05 


B-30524 


ES220 


5472 


M00084477C:C07 


B-30524 


ES220 


5472 


M00084647C:A05 


B-30524 


ES220 


5472 


M00084687C:F12 


B-30524 


ES220 


5472 


M00084756C:H01 


B-30524 


ES220 


5472 


M00084565D:F08 


B-30524 


ES220 


5472 


M00084560B:F12 


B-30524 


ES220 


5472 


M00084640D:A08 


B-30524 


ES220 


5472 


M00084443A:E10 


B-30524 


ES220 


5472 


M00084521A:E11 


B-30524 


ES220 


5472 


M00085019C-.D05 


B-30524 


ES220 


5472 


M00084587CA07 


B-30524 


ES220 


5472 


M00084616A:G03 


B-30524 


1 ES220 


5472 


M00084732B:A04 


B-30524 
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Library ID 


CMCC Number 


Cloneld 


NRRL Number 


ES220 


5472 


M00084666A:C04 


B-30524 


ES220 


5472 


M00084633A:H05 


B-30524 


ES220 


5472 


M00084510C:F05 


B-30524 


ES220 


5472 


M00084648B:F06 


B-30524 


ES220 


5472 


M00084700D:H04 


B-30524 


ES220 j 


5472 


M00084506C:A05 


B-30524 


ES220 


5472 


M00084475C:G11 


B-30524 


ES220 


5472 


M00084673B:H11 


B-30524 


ES220 


5472 


M00084595D:D08 


B-30524 


ES220 


5472 


M00084636C:A06 


B-30524 


ES220 


5472 


M00084612C:B01 


B-30524 


ES220 


5472 


M00084644A:H05 


B-30524 


ES220 


5472 


M00084602D:B09 


B-30524 


ES220 


5472 


M00084584B:G07 


B-30524 


ES220 


5472 


M00084678C:C11 


B-30524 


ES220 


5472 


M00084546C:C06 


B-30524 


ES220 


5472 


M00084755A:D02 


B-30524 


ES220 


5472 


M00084536D:F07 


B-30524 


ES220 


5472 


M00084699A:G05 


B-30524 


ES220 


5472 


M00084438D:H04 


B-30524 


ES220 


5472 


M00084766B:F02 


B-30524 


ES220 


5472 


M00084703B.D09 


B-30524 


ES220 


5472 


M00084856B:D03 


B-30524 


ES220 


5472 


M00084857A:G05 


B-30524 


ES220 


5472 


M00084868B-.D01 


B-30524 


ES220 


5472 


M00084823D:E05 


B-30524 


ES220 


5472 


M00084485C:B04 


B-30524 


ES220 


5472 


M00084910D:E07 


B-30524 


ES220 


5472 


M00084996B:D08 


B-30524 


ES220 


5472 


M00084487D:F07 


B-30524 


ES220 


5472 


M00084824C:C10 


B-30524 


ES220 


5472 


M00084949B:B12 


B-30524 


ES220 


5472 


M00084746B:B04 


B-30524 


ES220 


5472 


M00084944D:E05 


B-30524 


ES220 


5472 


M00084851C:F10 


B-30524 


ES220 


5472 


M00084849A:H08 


B-30524 


ES220 


5472 


M00084843A:G01 


B-30524 


ES220 


5472 


M00084921C:E04 


B-30524 


ES220 


5472 


M00084742A:F07 


B-30524 


ES220 


5472 


M00083799D:F10 


B-30524 


ES220 


5472 


M00084760D:D09 


B-30524 


ES220 


5472 


M00084845C:H05 


B-30524 


ES220 


5472 


M00084927A:C01 


B-30524 


ES220 


5472 


M00084935B-.E10 


B-30524 


ES220 


5472 


M00084974D:F11 


B-30524 


ES220 


5472 


M00084935C.-E07 


B-30524 


ES220 


5472 


M00084503D:G10 


B-30524 


ES220 


5472 


M00084907C-.C01 


B-30524 


ES220 


5472 


M00084893C:B01 


B-30524 


ES220 


5472 


M00083803B:F11 


B-30524 


ES220 


5472 


M00084945A:D10 


B-30524 


ES220 


5472 


M00084765BA10 


B-30524 


ES220 


5472 


M00084455D:G03 


B-30524 
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Library ID 


CMCC Number 


Cloneld 


NRRL Number 


ES220 


5472 


M00084874D-.E03 


B-30524 


ES220 


5472 


M00084889C:A04 


B-30524 


ES220 


5472 


M00084846B.-H07 


B-30524 


ES220 


5472 


M00084967B:B10 


B-30524 


ES220 


5472 


M00084838A:F12 


B-30524 


ES220 


5472 


M00084885A:C01 


B-30524 


ES220 


5472 


M00084823A:H06 


B-30524 


ES220 


5472 


M00084958B.-E10 


B-30524 


ES220 


5472 


M00084399B:E05 


B-30524 


ES220 


,5472 


M00084880B:D03 


B-30524 


ES220 


5472 


M00084877D:G07 


B-30524 


ES220 


5472 


M00084406B:C03 


B-30524 


ES220 


5472 


M00084856B:A12 


B-30524 


ES220 


5472 


M00084888DA.il 


B-30524 


ES220 


5472 


M00083831D:H11 


B-30524 


ES220 


5472 


M00084481D:C06 


B-30524 


ES220 


5472 


M00083834B:F09 


B-30524 


ES220 


5472 


M00084707D:B08 


B-30524 


ES220 


5472 


M00084976C:C12 


B-30524 


ES220 


5472 


M00085201C:C12 


B-30524 


ES220 


5472 


M00084379D-.A05 


B-30524 


ES220 


5472 


M00084392C:G06 


B-30524 


ES220 


5472 


M00084492B:F03 


B-30524 


ES220 


5472 


M00085697B:G05 


B-30524 


ES220 


5472 


M00085683B:B10 


B-30524 


ES220 


5472 


M00084988B:B08 


B-30524 


ES220 


5472 


M00084969C:H11 


B-30524 


ES220 


5472 


M00084988C:G03 


B-30524 


ES220 


5472 


M00085123A:E07 


B-30524 


ES220 


5472 


M00084988C:A04 


B-30524 


ES220 


5472 


M00084363A:C02 


B-30524 


ES220 


5472 


M00084975A:G05 


B-30524 


ES220 


5472 


M00084431C:G08 


B-30524 


ES220 


5472 


M00084972B:H03 


B-30524 


ES220 


5472 


M00084376A-.E06 


B-30524 


ES220 


5472 


M00084859C:D09 


B-30524 


ES220 


5472 


M00084957B:H07 


B-30524 


ES220 


5472 


M00085053D:D04 


B-30524 


ES220 


5472 


M00084425A.-A01 


B-30524 


ES220 


5472 


M00084367D:E06 


B-30524 


ES220 


5472 


M00084938B:A11 


B-30524 


ES220 


5472 


M00085051A:G03 


B-30524 


ES220 


5472 


M00083817B:G09 


B-30524 


ES220 


5472 


M00085229B:C10 . 


B-30524 


ES220 


5472 


M00085178D:F01 


B-30524 


ES220 


5472 


M00084980D:D02 


B-30524 


ES220 


5472 


M00085228B:C10 


B-30524 


ES220 


5472 


M00085243A:D07 


B-30524 


ES220 


5472 


M00085031B:E03 


B-30524 


ES220 


5472 


M00085164C:G05 


B-30524 


ES220 


5472 


M00085031C:D05 


B-30524 


ES220 


5472 


M00084251D:C05 


B-30524 


ES220 


5472 


M00085027A:C02 


B-30524 
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Library ED 


CMCC Number 


Cloneld ! 


NRRL Number 


ES220 


5472 


M00084248D:H09 


B-30524 


ES220 


5472 


M00085209C:F11 


B-30524 


ES220 


5472 


M00084368D:D03 


B-30524 


ES220 


5472 


M00083818A:E09 


B-30524 


ES220 


5472 


M00084980C:E06 


B-30524 


ES220 


5472 


M00084248B-.C06 


B-30524 


ES220 


5472 


M00085244C:D03 


B-30524 


ES220 


5472 


M00084987A:D09 


B-30524 


ES220 


5472 


M00084994A:H04 


B-30524 


ES220 


5472 


M00084970D:E08 


B-30524 


ES220 


5472 


M00085038D:D10 


B-30524 


ES220 


5472 


M00085035B:C12 


B-30524 


ES220 


5472 


M00085184D-.B08 


B-30524 


ES221 


5473 


M00084666C:A06 


B-30525 


ES221 


5473 


M00084657C:E01 


B-30525 


ES221 


5473 


M00084540B:B08 


B-30525 


ES221 


5473 


M00084415C:C05 


B-30525 


ES221 


5473 


M00084812A-.C02 


B-30525 


ES221 


5473 


M00084396B:B03 


B-30525 


ES221 


5473 


M00084844B:H08 


B-30525 


ES221 


5473 


M00084877D:H09 


B-30525 


ES221 


5473 


M00084925C:G01 


B-30525 


ES221 


5473 


M00084970A:C11 


B-30525 


ES221 


5473 


M00084961C:F01 


B-30525 


ES221 


5473 


M00084391B:D06 


B-30525 


ES221 


5473 


M00084694D:F04 


B-30525 


ES221 


5473 


M00084698B:D02 


B-30525 


ES221 


5473 


M00084388A:G03 


B-30525 


ES221 


5473 


M00084973A:C01 


B-30525 


ES221 


5473 


M00084423C:G11 


B-30525 


ES221 


5473 


M00084497D:D03 


B-30525 


ES221 


5473 


M00084889D:G06 


B-30525 


ES221 


5473 


M00084959B:C07 


B-30525 


ES221 


5473 


M00084432B:C05 


B-30525 


ES221 


5473 


M00084489A:D12 


B-30525 




5473 


M00084748A:D09 


B-30525 


ES221 


5473 


M00084962C:F10 


B-30525 


ES221 


5473 


M00084767B:D10 


B-30525 


ES221 


5473 


M00084711B:A05 


B-30525 


ES221 


5473 


M00084743A:E03 


B-30525 


ES221 


5473 


M00084466B:E01 


B-30525 


ES221 


5473 


M00084450CA09 


B-30525 


ES221 


5473 


M00084492C:B05 


B-30525 


ES221 


5473 


M00084487BA06 


B-30525 


ES221 


5473 


M00084480B:A05 


B-30525 


ES221 


5473 


M00084764D:G08 


B-30525 


ES221 


5473 


M00084743D:H04 


B-30525 


ES221 


5473 


M00084891D:A02 


B-30525 


ES221 


5473 


M00084822C:D06 


B-30525 


ES221 


5473 


M00084853D:A12 


B-30525 


ES221 


5473 


M00084822B:G11 


B-30525 


ES221 


5473 


M00084756D:C04 


B-30525 


ES221 


5473 


M00084839C:B09 


B-30525 
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Library ED 


CMCC Number 


Cloneld 


NRRL Number 


ES221 


5473 


M00084767D:B04 


B-30525 


ES221 


5473 


M00084703A:E04 


B-30525 


ES221 


5473 


M00084853A:F08 


B-30525 


ES221 


5473 


M00084956C:G09 


B-30525 


ES221 


5473 


M00084908C:F07 


B-30525 


ES221 


5473 


M00084902B:A10 


B-30525 


ES221 


5473 


M00084833D:B04 


B-30525 


ES221 


5473 


M00085023D:E11 


B-30525 


ES221 


5473 


M00085151A:B04 


B-30525 


ES221 


5473 


M00085039D:F09 


B-30525 


ES221 


5473 


M00085169A:H12 


B-30525 


ES221 


5473 


M00085052B-.E04 


B-30525 


ES221 


5473 


M00085171D:F05 


B-30525 


ES221 


5473 


M00085050A:B06 


B-30525 


ES221 


5473 


M00085155B:F10 


B-30525 


ES221 


5473 


M00085123C:G11 


B-30525 


ES221 


5473 


M00085182B:H10 


B-30525 


ES221 


5473 


M00084675A:E02 


B-30525 


ES221 


5473 


M00085248B:G12 


B-30525 


ES221 


5473 


M00084731C:G07 


B-30525 


ES221 


5473 


M00085701A-A09 


B-30525 


ES221 


5473 


M00085246B:G12 


B-30525 


ES221 


5473 


M00084967C:D12 


B-30525 


ES221 


5473 


M00085190B:C09 


B-30525 


ES221 


5473 


M00085167C:D06 


B-30525 


ES221 


5473 


M00085705A:E01 


B-30525 


ES221 


5473 


M00085214D.-G01 


B-30525 


ES221 


5473 


M00084755D:E06 


B-30525 


ES221 


5473 


M00084630D:F09 


B-30525 


ES221 


5473 


M00085191A:B03 


B-30525 


ES221 


5473 _j 


M00085143C:D05 


B-30525 


ES221 


5473 


M00084886A:C06 


B-30525 


ES221 


5473 


M00083803B:F12 


B-30525 


ES221 


5473 


M00084949B:H11 


B-30525 


ES221 


5473 


M00084701C:E08 


B-30525 


ES221 


5473 


M00084945A:H10 


B-30525 


ES221 


5473 


M00084667C-.A03 


B-30525 


ES221 


5473 


M00084953D:D03 


B-30525 


ES221 


5473 


M00084539D.-D11 


B-30525 


ES221 


5473 


M00084737A:C09 


B-30525 


ES221 


5473 


M00084968C:D10 


B-30525 


ES221 


5473 


M00084670B:A09 


B-30525 


ES221 


5473 


M00085167A:G02 


B-30525 


ES221 


5473 


M00084554C:D05 


B-30525 


ES221 


5473 


M00085145C:D02 


B-30525 


ES221 


5473 


M00084722D:G04 


B-30525 


ES221 


5473 


M00084721C:F09 


B-30525 


ES221 


5473 


M00084866B:A03 


B-30525 


ES221 


5473 


M00084727A:A02 


B-30525 


ES221 


5473 


M00084407A:H09 


B-30525 


ES221 


5473 


M00084855D:H05 


B-30525 


ES221 


5473 


M00084403D:D04 


B-30525 


ES221 


5473 


M00085144D:G03 


B-30525 
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Library ID 


CMCC Number | 


Cloneld 


NRRL Number 




5473 


M00084880D:A10 


B-30525 


ES221 


5473 


M00084958C:B03 


B-30525 


ES221 


5473 


M00084888C:D12 


B-30525 


ES221 


5473 


M00084587C:G07 


B-30525 


ES221 


5473 


M00083844C:C04 


B-30525 


ES221 


5473 


M00084647D:C05 


B-30525 


ES221 


5473 


M00084528C:F06 


B-30525 




5473 


M00084857D:A11 


B-30525 


ES221 


5473 


M00084385A:D02 


B-30525 


ES221 


5473 


M00084561C:D07 


B-30525 


ES221 


5473 


M00084994D:H04 


B-30525 




5473 


M00084448B:D11 


B-30525 


ES221 


5473 


M00085006D:C10 


B-30525 


ES221 


5473 


M00084580B-.B05 


B-30525 




5473 


M00083814D:A10 


B-30525 


ES221 


5473 


M00084970C:G03 


B-30525 


ES221 


5473 


M00084372D:H11 


B-30525 


ES221 


5473 


M00084377B:E11 


B-30525 


ES221 


5473 


M00085230B:G08 


B-30525 


ES221 


5473 


M00084584B:F09 


B-30525 




5473 


M00084584B:H12 


B-30525 


ES221 


5473 


M00085249C:C11 


B-30525 


ES221 


5473 


M00084441B:E05 


B-30525 


ES221 


5473 


M00083841A:G01 


B-30525 


ES2?1 


5473 


M00085006D:C04 


B-30525 


ES221 


5473 


M00084686B:B04 


B-30525 


ES221 


5473 


M00084998A:C12 


B-30525 


ES221 


5473 


M00085034B:E11 


B-30525 


ES221 


5473 


M00084683B:A01 


B-30525 


FS221 


5473 


M00084613A:A01 


B-30525 


ES221 




M00084633B:A06 


B-30525 


FS99i 


5473 


M00085032C:F04 


B-30525 


ES221 


5473 


M00085022B:F05 


B-30525 


ES221 


5473 


M00084509A:E10 


B-30525 


ES221 


5473 


M00084400A:B09 


B-30525 


ES221 


5473 


M00084677C:F03 


B-30525 


ES221 


5473 


M00084427B:D01 


B-30525 


ES221 


5473 


M00083844B:C04 


B-30525 


ES221 


5473 


M00084598D:H05 


B-30525 


ES221 


5473 


M00084443B:C02 


B-30525 


ES221 


5473 


M00084514AA03 


B-30525 


ES221 


5473 


M00084560C-.G05 


B-30525 


ES221 


5473 


M00084504C:F05 


B-30525 


ES221 


5473 


M00084517C:D06 


B-30525 


ES221 


5473 


M00084420C:D03 


B-30525 


ES221 


5473 


M00084524D:D02 


B-30525 1 


ES221 


5473 


M00084499DA10 


B-30525 


ES221 


5473 


M00085022B:B03 


B-30525 


ES221 


5473 


M00084958B:E10 


B-30525 


ES221 


5473 


M00084513C:C10 


B-30525 


ES221 


5473 


M00084595B:C08 


B-30525 


ES221 


5473 


M00083804A:H12 


B-30525 


ES221 


5473 


M00084859D:B03 


B-30525 
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Cloneld 
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ES221 


5473 


M00084641B:F08 


B-30525 


ES221 


5473 


M00084844A:E10 


B-30525 


ES221 


5473 


M00083838A:E05 


B-30525 


ES221 


5473 


M00083849C:F11 


B-30525 


ES221 


5473 


M00084461C:D06 


B-30525 


ES221 


5473 


M00084810D:B10 


B-30525 


ES221 


5473 


M00085047D:F08 


B-30525 


ES221 


5473 


M00084912D:G06 


B-30525 


ES221 


5473 


M00084645C:F07 


B-30525 


ES221 


5473 


M00084912D:A09 


B-30525 


ES221 


5473 


M00084862B:B01 


B-30525 


ES221 


5473 


M00084938C-.G06 


B-30525 


ES221 


5473 


M00084534B:E12 


B-30525 


ES221 


5473 


M00084909C:G02 


B-30525 


ES221 


5473 


M00084973A:A01 


B-30525 


ES221 


5473 


M00084651B:G10 


B-30525 


ES221 


5473 


M00084925A-.B08 


B-30525 


ES221 


5473 


M00084568D-A02 


B-30525 


ES221 


5473 


M00084456A:H04 


B-30525 


ES221 


5473 


M00084988C:B01 


B-30525 


ES221 


5473 


M00084842C:B07 


B-30525 


ES221 


5473 


M00084708A:A11 


B-30525 


ES221 


5473 


M00084602C:E04 


B-30525 


ES221 


5473 


M00084757B:F11 


B-30525 


ES221 


5473 


M00084483A:C06 


B-30525 


ES221 


5473 


M00084605B:H04 


B-30525 


ES221 


5473 


M00083812C:G02 


B-30525 


ES221 


5473 


M00084610D:H04 


[ B-30525 


ES221 


5473 


M00085056D:B12 


B-30525 


ES221 


5473 


M00085017C:A11 


B-30525 


ES221 


5473 


M00084573A:A10 


B-30525 


ES221 


5473 


M00084637B:E01 


B-30525 


ES221 


5473 


M00085056B:B06 


B-30525 


ES221 


5473 


M00084510C:H01 


B-30525 


ES221 


5473 


M00084577B:C08 


B-30525 


ES221 


5473 


M00084646B:B03 


B-30525 


ES221 


5473 


M00084844C:F04 


B-30525 


ES221 


5473 


M00084894B:F11 


B-30525 


ES221 


5473 


M00084930B:E12 


B-30525 


ES221 


5473 


M00084469B:F08 


B-30525 


ES221 


5473 


M00084569D:B04 


B-30525 


ES221 


5473 


M00084453D:B12 


B-30525 


ES221 


5473 


M00083844A:E12 


B-30525 


ES221 


5473 


M00085009D:A02 


B-30525 


ES221 


5473 


M00084619A:E04 


B-30525 


ES221 


5473 


M00085006C:C07 


B-30525 


ES222 


5474 


M00084459A:F10 


B-30526 


ES222 


5474 


M00084721B:C11 


B-30526 


ES222 


5474 


M00084454A:G08 


B-30526 


ES222 


5474 


M00084460D:B04 


B-30526 


ES222 


5474 


M00084723D:G09 


B-30526 


ES222 


5474 


M00084704A:C12 


B-30526 


ES222 


5474 


M00084487C:H06 


B-30526 
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r 

Cloneld j 


NRRL Number 


ES222 


5474 


M00084867C:G02 


B-30526 


ES222 


5474 


M00084475B:D03 


B-30526 


ES222 


5474 


M00084490A:C12 


B-30526 


ES222 


5474 


M00084865D:G02 


B-30526 


ES222 


5474 


M00084876D:A06 


B-30526 


ES222 


5474 


M00084553D:G05 


B-30526 


ES222 


5474 


M00084558D:A04 


B-30526 


ES222 


5474 


M00084645B:A06 


B-30526 


ES222 


5474 


M00084747D:G02 


B-30526 


ES222 


5474 


M00084884D:D03 


B-30526 


ES222 


5474 


M00084700A-.C10 


B-30526 


ES222 


5474 


M00084973A:C01 


B-30526 


ES222 


5474 


M00084493AE03 


B-30526 


ES222 


5474 


M00084497B:C12 


B-30526 


ES222 


5474 


M00084500D:B11 


B-30526 


ES222 


5474 


M00084523C:C10 


B-30526 


ES222 


5474 


M00084526D:E09 


B-30526 


ES222 


5474 


M00084923D:B05 


B-30526 


ES222 


5474 


M00084962C:F10 


B-30526 


ES222 


5474 


M00084669C:A10 


B-30526 


ES222 . 


5474 


M00084444D:F09 


B-30526 


ES222 


5474 


M00084757B:F05 


B-30526 


ES222 


5474 


M00084922A:C08 


B-30526 


ES222 


5474 


M00084960D:D02 


B-30526 


ES222 


5474 


M00084837B:E06 


B-30526 


ES222 


5474 


M00084763D:A04 


B-30526 


ES222 


5474 


M00084651C:H01 


B-30526 


ES222 


5474 


M00084441D:E09 


B-30526 


ES222 


5474 


M00084509D:C02 


B-30526 


ES222 


5474 


M00084510D:D05 


B-30526 


ES222 


5474 


M00084657D:B10 


B-30526 


ES222 


5474 


M00084946D:H05 


B-30526 


ES222 


5474 


M00084506A:E08 


B-30526 


ES222 


5474 


M00084420D:C07 


B-30526 


ES222 


5474 


M00085247A:F05 


B-30526 


ES222 


5474 


M00085142D:F04 


B-30526 


ES222 


5474 


M00085151A:H09 


B-30526 


ES222 


5474 


M00085029D:E12 


B-30526 


ES222 


5474 


M00085141C:G06 


B-30526 


ES222 


5474 


M00085035D:D09 


B-30526 




5474 


M00085168D:D04 


B-30526 


ES222 


5474 


M00084995B:B08 


B-30526 


ES222 


5474 


M00085008B:H11 


B-30526 


ES222 


5474 


M00085059B:H11 


B-30526 


ES222 


5474 


M00085125C:H06 


B-30526 


ES222 


5474 


M00084890B:E02 


B-30526 


ES222 


5474 


M00084418DA.04 


B-30526 


ES222 


5474 


M00084961CA06 


B-30526 


ES222 


5474 


M00084766B:E03 


B-30526 


ES222 


5474 


M00084406A:B03 


B-30526 


ES222 


5474 


M00085686A:C05 


B-30526 


ES222 


5474 


M00085124A:G04 


B-30526 


ES222 


5474 


M00085059B:H07 


B-30526 



170 



WO 2004/039943 



PCT/LS2003/015465 



Table 15 



Library ID 


CMCC Number 
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Wfz 


JVlUUWo4O40rS.JJU / 


!"!!!!! 


ES222 




M00084246B*D10 


13-30526 




5474 




B-30526 


ES222 


5474 


M0008473 8B'A09 


B-30526 


ES222 


5474 


M00085015A"C09 




ES222 


5474 


M00083839B'G09 


R3n5?6 


ES222 


5474 


M000 85 0 1 2C*D06 


B 30526 




5474 


M00085058A'H02 


B 30526 ' 


ES222 


2474 


M00085009C:C01 


B-30526 




5474" 


MOOO 83 8 1 8C: A02 


B-30526 


pqw 


5474 


\/Tnnns5nr>7'R-pn7 


B-30526 


v^yyj 


5474 


M00085047A:H02 




ES222 


5474 — ' 


M00085245C'D07 


B 30526 


ES222 


5474" 


M00085032D'C03 


B 30526 


ES222 


5474 


M00085039B'E09 


B-30526 


ES222 


5474 


M00085053C'D07 




ES222 


5474 


M00084364D-F08 


B-30526 




5474 


M00085152B'A06 


B-30526 


FS723 


5474 


M00084969C:F03 




pgooo 


5474 




B-30526 


PCTTT 


5474 


\Annns/i477r i *nn4 

1V1UUU044Z /L>.1JU4 




T7Q777 


5474 


M00085018C'B09 


B-30526 


rjoooo 


5474 


MOOO 8466 8D 'D08 


ping^ 


~~ 

1|~ 


5Z7I 


MUUUoj 1 / jd.AUj 


B-30526 


iii? 


5474 


1V1UUUo44j /L-.LfUj 


B-30526 




5474 




B-30526 


BCW 

H777 


5474 


AAftftftSM QQ8R- A ft 4 






5474 


MOOO 847 16D'II03 


B-30526 


PC999 


5474 

54^74 


M00O84708B'A06 


P w^t 


zzz — 




JVlUUUo4 1J JD.AU4 


B-30526 


ES222 


5^74 


M000843 80C'C09 


B-30526 


ES222 


5I7Z 


ivAnnnsKfti 1 a -T7ftfi 


B-3 0526 


ES222 


5474 


AAftnftRA37AA ■ A 1 ft 


B-30526 





547 4 


A Aftftft S ^ 1 QAP-R 1 7 


B-30526 




5474 


M00084971C'Ci07 


B-30526 





5474 


M000845 15D'G03 




ES222 


5474 


M00084961D'H03 


B 30526 


ES222 


5474 


M00084908D'B1 1 


B 30526 


fqOOO 


5474 


M00084702A:B08 


B-30526 




5474 


M00083838C'F07 







547 4 


M00084390B:H04 


B-30526 


POOOO 


5474 


M00084734A'H01 


B-3 0526 


BOOO'O 


5474 


M00083817BA1 1 


B-3 0526 


ES222 


5474 


M00085176C'B11 


B-30526 


ES222 


5474 


M00084902CF05 


B-30526 


ES222 


5474 


M00085677A:E02 


B-30526 


ES222 


5474 


M00084948D:B08 


B-30526 


ES222 


5474 


M00085190B:H04 


B-30526 


ES222 


5474 


M00084820D:A03 


B-30526 


ES222 


5474 


M00084479B:E04 


B-30526 


ES222 


5474 


M00084408D:E06 


B-30526 


ES222 


5474 


M00085009B-.F10 


B-30526 


ES222 


5474 


M00085697A.-F01 


B-30526 
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ES222 


5474 


M00084423D:B05 


B-30526 




ES222 


5474 


M00084973A:A01 


B-30526 




ES222 


5474 


M00083804B:C03 


B-30526 




ES222 


5474 


M00084841B:H09 


B-30526 




ES222 


5474 


M00084685D:B11 


B-30526 




ES222 


5474 


M00084599D:C02 


B-30526 




ES222 


5474 


M00084573D:G11 


B-30526 




ES222 


5474 


M000 84603 A:B07 


B-30526 




ES222 


5474 


M00084823D:E06 


B-30526 




ES222 


5474 


M00084565A:D10 


B-30526 




ES222 


5474 


M00084767B:F06 


B-30526 




ES222 


5474 


M00084963D-.D07 


B-30526 




ES222 


5474 


M00084611AA06 


B-30526 




ES222 


5474 


M00084829B:F06 


B-30526 




ES222 


5474 


M00084850CA11 


B-30526 




ES222 


5474 


M00084540D:B12 


B-30526 




ES222 


5474 


M00084614C-.G05 


B-30526 




ES222 


5474 


M00084826B:D12 


B-30526 


ES222 


5474 


M00084605D:G09 


B-30526 


ES222 


5474 


M00084923D:F11 


B-30526 


ES222 


5474 


M00084664D:E05 


B-30526 


ES222 


5474 


M00084533A:C04 


B-30526 


ES222 


5474 


M00084843D:F05 


B-30526 


ES222 


5474 


M00084894A:G09 


B-30526 


ES222 


5474 


M00084913B:F05 


B-30526 


ES222 


5474 


M00083817DA08 


B-30526 


ES222 


5474 


M00084451DA03 


B-30526 


ES222 


5474 


M00084675BA04 


B-30526 


ES222 


5474 


M00084889A:B07 


B-30526 


ES222 


5474 


M00084879AA04 


B-30526 


ES222 


5474 


M00084638DA05 


B-30526 


ES222 


5474 


M00084468A:A09 


B-30526 


ES222 


5474 


M00084634A:D01 


B-30526 


ES222 


5474 


M00084577B:D04 


B-30526 


ES222 


5474 


M00084860BA01 


B-30526 


ES222 


5474 


M00084567B-.F03 


B-30526 




ES222 


5474 


M00084619A:G10 


B-30526 




ES222 


5474 


M00084683A-.B12 


B-30526 




ES222 


5474 


M00084631D:G01 


B-30526 




ES222 


5474 


M00084520B:A12 


B-30526 




ES222 


5474 


M00084886B:D06 


B-30526 




ES222 


5474 


M00084727A-.G09 


B-30526 


ES222 


5474 


M00084393A:G07 


B-30526 


ES222 


5474 


M00084571A:C02 


B-30526 


ES222 


5474 


, M00084866C:H04 


B-30526 




ES222 


5474 


M00084449A:D09 


B-30526 




ES222 


5474 


M00084857C:E11 


B-30526 




ES222 


5474 


M00085226C-.F08 


B-30526 




ES222 


5474 


M00084392C:D03 


B-30526 




ES222 


5474 


M00084389A-.F12 


B-30526 




ES222 


5474 


M00084696CA07 


B-30526 




ES222 


5474 


M00084397DA09 


B-30526 




ES222 


5474 


M00085 173A:B07 


B-30526 
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ES222 


5474 


M00084252B:H01 


B-30526 


ES222 


5474 


M00084970C:H09 


B-30526 


ES222 


5474 


M00084648A-.F08 


B-30526 


ES222 


5474 


M00085245D:G07 


B-30526 


ES222 


5474 


M00084642C:F10 


B-30526 


ES222 


5474 


M00085220D:E06 


B-30526 


ES222 


5474 


M00084745A:H04 


B-30526 


ES222 


5474 


M00083809B:E08 


B-30526 


ES222 


5474 


M00084940B:F06 


B-30526 


ES222 


5474 


M00084533B:B10 


B-30526 


ES222 


5474 


M00084970D:B01 


B-30526 


ES222 


5474 


M00084583D:H12 


B-30526 


ES222 


5474 


M00084585D:H12 


B-30526 


ES222 


5474 


M00084581B:E06 


B-30526 


ES222 


5474 


M00084588B:D02 


B-30526 


ES222 


5474 


■ M00084919D-.B08 


B-30526 


ES222 


5474 


M00084812A.-E05 


B-30526 


ES222 


5474 


M00084768B:E09 


B-30526 


ES222 


5474 


M00084748A:H02 


B-30526 


ES222 


5474 


M00084519B:D01 


B-30526 




5474 


M00084926B-.C05 


B-30526 


ES222 


5474 


M00084847B:G05 


B-30526 


ES222 


5474 


M00084858C:B01 


B-30526 


ES222 


5474 


M00084483A:E05 


B-30526 


ES222 


5474 


M00084596A:G03 


B-30526 




5474 


M00084681B.G11 


B-30526 


ES223 


5475 


M00085368BA02 


B-30527 




5475 


M00085365C:C09 


B-30527 


ES223 


5475 


M00085317B.G09 


B-30527 


ES223 


5475 


M00085732A:B09 


B-30527 


ES223 


5475 


M00085649CA12 


B-30527 


ES223 


5475 


M00085337C:E09 


B-30527 


ES223 


5475 


M00085520C;D02 


B-30527 


ES223 


5475 


M00085358BA04 


B-30527 


ES223 


5475 


M00085640A:H11 


B-30527 


ES223 


5475 


M00085344D:B07 


B-30527 


ES223 


5475 


M00085314C:F01 


B-30527 


ES223 


5475 


M00085334A:D10 


B-30527 


ES223 


5475 


M00085262C.-E04 


B-30527 


ES223 


5475 


M00083750D-.G12 


B-30527 


ES223 


5475 


M00085255B:F11 


B-30527 


ES223 


5475 


M00085701D:A02 


B-30527 . 




5475 


M00086280B:G09 


B-30527 


ES223 


5475 


M00085628C-.D08 


B-30527 


ES223 


5475 


M00083726B:E07 


B-30527 


ES223 


5475 


M00086285B:C10 


B-30527 


ES223 


5475 


M00086084D.-E12 


B-30527 


ES223 


5475 


M00085446A:D04 


[ B-30527 


ES223 


5475 


M00085697D:H11 


B-30527 


ES223 


5475 


M00086085A:H03 


B-30527 


ES223 


5475 


M00086057A:F07 


B-30527 


ES223 


5475 


M00086196A-.F07 


B-30527 


ES223 \ 5475 


M00086279A:B07 


B-30527 
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ES223 


5475 


M00086266B:E10 


B-30527 


ES223 


5475 


M00086280A:E02 


B-30527 


ES223 


5475 


M00085309A:D11 


B-30527 


ES223 


5475 


M00086291D:B08 


B-30527 


ES223 


5475 


M00085861C:A03 


B-30527 


ES223 


5475 


M00086247A:G11 


B-30527 


ES223 


5475 


M00085373C:B06 


B-30527 


ES223 j 


5475 


M00085332D:B03 


B-30527 


ES223 


5475 


M00085632C:A06 


B-30527 


ES223 


5475 


M00085360B:G03 


B-30527 


ES223 


5475 


M00086191A:G09 


B-30527 


ES223 


5475 


M00085473C:B02 


B-30527 


ES223 


5475 


M00085600B:B03 


B-30527 


ES223 


5475 


M00083749D:B09 


B-30527 


ES223 


5475 


M00085301B:C10 


B-30527 


ES223 


5475 


M00085278D:F03 


B-30527 


ES223 


5475 


M00085296C:C09 


B-30527 


ES223 


5475 


M00085435B:C05 


B-30527 


ES223 


5475 


M00085533C:D11 


B-30527 


ES223 


5475 


M00086061B:E02 


B-30527 


ES223 


5475 


M00085284D:A09 


B-30527 


ES223 


5475 


M00085566A:D12 


B-30527 


ES223 


5475 


M00085528A:E02 


B-30527 


ES223 


5475 


M00085336D:B10 


B-30527 


ES223 


5475 


M00085315C:B03 


B-30527 


ES223 


5475 


M00085509AA.02 


B-30527 


ES223 


5475 


M00083698D:E01 


B-30527 


ES223 


5475 


M00083701D:G09 


B-30527 


ES223 


5475 


M00086155A-.G12 


B-30527 


ES223 


5475 


M00085293B:B05 


B-30527 


ES223 


5475 


M00085728B:C08 


B-30527 


ES223 


5475 


M00085611B:D03 


B-30527 


ES223 


5475 


M00085592A:G06 


B-30527 


ES223 


5475 


M00085304A:B11 


B-30527 


ES223 


5475 


M00085266D:C09 


B-30527 


ES223 


5475 


M00085335B-.D09 


B-30527 


ES223 


5475 


M00085707C:A10 


B-30527 


ES223 


5475 


M00085555D:F08 


B-30527 


ES223 


5475 


M00085588B:G10 


B-30527 


ES223 


5475 


M00085264C-.F04 


B-30527 


ES223 


5475 


M00085733D.-E05 


B-30527 


ES223 


5475 


M00085647A:C08 


B-30527 


ES223 


5475 


M00083714C:F04 


B-30527 


ES223 


5475 


M00085707A;F01 


B-30527 


ES223 


5475 


M00085548C:D04 


B-30527 


ES223 


5475 


M00083745A:A10 


B-30527 


ES223 


5475 


M00085396B.-G04 


B-30527 


ES223 


5475 


M00085449C-.D04 


B-30527 


ES223 


5475 


M00083698B.-H01 


B-30527 


ES223 


5475 


M00084772C:G12 


B-30527 


ES223 


5475 


M00086126C:D09 


B-30527 


ES223 


5475 


M00085808D:E01 


B-30527 


ES223 


5475 


M00085927A:F06 


B-30527 
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5475 


M00085814C:C12 


B-30527 


ES223 


5475 


M00086015A:B03 


B-30527 


ES223 


5475 


M00086146D:A09 


B-30527 


ES223 


5475 


M00085076C:A07 


B-30527 


ES223 


5475 


M00085427C.A04 


B-30527 


ES223 


5475 


M00084774B:C10 


B-30527 


ES223 


5475 


M00085432A:H08 


B-30527 


ES223 


5475 


M00084796D:B01 


B-30527 


ES223 


5475 


M00086127C:C05 


B-30527 


ES223 


5475 


M00086160D:F08 


B-30527 


ES223 


5475 


M00084802A:H09 


B-30527 


ES223 


5475 


M00086081D:H11 


B-30527 


ES223 


5475 


M00086106D:H01 


B-30527 




5475 


M00086159A:F05 


B-30527 


ES223 


5475 


M00084782D:H08 


B-30527 


ES223 


5475 


M00085956D:G04 


B-30527 




5475 


M00085770CA12 


B-30527 


ES223 


5475 


M00086008D:F08 


B-30527 


ES223 


5475 


M00086018AA05 


B-30527 


ES223 


5475 


M00085761A:B03 


B-30527 




5475 


M00085751AA11 


B-30527 


ES223 


5475 


M00085956B:E08 


B-30527 


£§923 


5475 


M00085955C:C03 


B-30527 


ES923 


5475 


M00085904D:D02 


B-30527 




5475 


M00085899C:G10 


B-30527 


ES223 


5475 


M00085927C:G10 


B-30527 




5475 


M00085896DAU 


B-30527 


ES223 


5475 


M00085892A-.F04 


B-30527 


ES223 


5475 


M00085882B:F11 


B-30527 


ES223 


5475 


M00085419A:G09 


B-30527 


ES223 


5475 


M00085962B:A12 


B-30527 


ES223 


5475 


M00085811B:D12 


B-30527 


ES223 


5475 


M00085986B:H02 


B-30527 


ES223 


5475 


M00085922AA08 


B-30527 


ES223 


5475 


M00085854C:F04 


B-30527 




5475 


M00085835D:F06 


B-30527 


ES223 


5475 


M00086183A:H04 


B-30527 


ES223 


5475 


M00086193A:F04 


B-30527 


ES223 


5475 


M00086197B:A03 


B-30527 


FS223 


5475 


M00086203C:H04 


B-30527 





5475 


M00085827D:D01 


B-30527 


ES223 


5475 


M00086097D:D12 


B-30527 


ES223 


5475 


M00086294C:G05 


B-30527 


ES223 


5475 


M00086176CA06 


B-30527 


ES223 


5475 


M00085825C:D12 


B-30527 


ES223 


5475 


M00085849D:G06 


B-30527 


ES223 


5475 


M00085817C:C10 


B-30527 


ES223 


5475 


M00085807D:G11 


B-30527 


ES223 


5475 


M00085839B:B12 


B-30527 


ES223 


5475 


M00085750A:G03 


B-30527 


ES223 


5475 


M00086248A-.H09 


B-30527 


ES223 


5475 


M00085964A:B11 


B-30527 


ES223 


5475 


M00086003D-.G08 


B-30527 
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B-30527 


ES223 




M00085066B'D12 


B-30527 


ES223 


5471 


M00084775C"E05 


B-30527 


Hft^ 


TTTT 

^47^ 


M00085083A'E04 








AvTAAAG *\ AQ AP • O AQ 


B-30527 


CTO 

— — 


z-— 


A/TAAA B ^ 1 AA A 'tTA7 
ivlUUUoJ lUUA.rlU / 


B-30527 




S47S 


AfAAAfi^ 1 A1 C^-IKYX 


B-305'27 


— it?9T — 


S47l 


M00085 105D*H02 


B-30527 




S47S 


A /T A A A fi £. 'X 0 X R ■ P (\A 




new 




rj7l 


M00084808D'A07 


B 30527 




sl7^ 


A/TAAAQ^AA^rVRI A 


B-30527 


roii 




S47l 


A,f AAAS/l7R7R*r*1 7 






S475 


A/fAAAS^A^RP-RfW 


B-30527 


pqOTQ 

— zzl 


S47S 

Tzk 


A/f AAAC/\70 1 A -P 1 ft 


B-30527 


— ES zzl — 


hM. 


A/fAAAQ/1 QA7TVT7A7 
MUUU54&U /D.rU / 


B-30527 


— ES ~~o — 


^—^ 


A/fAAAB/1'7QQP-T7AX 


B-30527 


ES - 23 


£ll£ 


M00084781A:E05 




ES223 


5475 


M00086324D:D01 


B-30527 


ES224 


2zZ2 


aaaaab«;A41 p*p.ai 


B-30528 


ES224 





M00085623B:G1 1 


R QA^7C 
D-3UJZO 


ES224 


5 - 4 -^| 


A>fAAAQCQncA-nft7 


B-30528 





^7 


AAAAA0^C/1^A-U1A 

MUWoJo4oA.tilU 


R ^A^7C 


ES ool 




A/fAAAR^AQTVnAA 


B-30528 


cool 


7ah? 


x^aaak^h 1 7P«nns 

MUUUoJol /CLiUJ 


B-30528 




— — 


A/fAnAS^A'} ^P'WA7 


B-30528 


pcood 


S477 


AAAAAH £ \QA7R-R1 1 


B-30528 


le^I 

ES224 


Tat? 

^| 


AAAnAS'xA^ATVTn 7 


B-30528 


ES224 


' 


A/fAAAC^C^/l A -T^AQ 


B-30528 


ES224 


Wlz 


M00085628B'.D03 


R 3AS7R 


ES224 


^ItT 


A/f AAAC^SQ^ A -TVW* 


B-30528 


ES224 





A4AAAK^A9Ar , *T7A^ 


B-3 0528 






AAAAAS^7^AR*P1 ft 




toa 

ES224^ 





l\AAAAft^773 A -VOt^ 


B-30528 




-— 


AAAAAS ^O^^R'RftfA 


B-30528 


77A 

_ ES224 


^477 

^jT 


M00084804B"E0 1 


B-3 0528 


ES224 


^rl 


MUWJojjj /xJ.rUo 


R 'ift^Ofi 


ES224 


^1 


M00085349A"C08 


B-30528 


ES224 





MUUU04 / oJV^. J\\Jy 


B-3 0528 


ES224 


fr 4 -^ 


AAAnASA7R7TVni A 
MUUUo4 / oZJJ.JJ 1U 




ES224 


£zZ2 


A/fAAAS^77'5 A - A AQ 


B-30528 


ES224 


^liE 


AyTAAAQ/ITO/IR .m 1 


R "3A*\7R 
D-D U DZo 


ES224 


5 - 4 7J2 


AAAAACyl "7C AT^-T^AT 


B-30528 


ES224 


S. 47 ^ 


AvTAAAO*:*2,1 ^R-PAO 


B-30528 


ES224 




M00085344D:F01 


B-30528 




5I7I 


M00085747A:B11 


B-30528 


ES224 


5476 


M00085814A-.C02 


B-30528 


ES224 


5476 


M00085503D:D05 


B-30528 


ES224 


5476 


M00085304B.-D11 


B-30528 


ES224 


5476 


M00085900B:E02 


B-30528 


ES224 


5476 


M00085859B;A11 


B-30528 


ES224 


5476 


M00085860D-.H02 


B-30528 


ES224 


5476 


M00085649B:A03 


B-30528 


[ ES224 


5476 


M00085815C:E11 


B-30528 
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5476 


M00085941B-.A06 


B-30528 


ES224 


5476 


M00084800B-.H09 


B-30528 


ES224 


5476 


M00085919B-.F02 


B-30528 




5476 


M00085449A:E02 


B-30528 


ES224 


5476 


M00085344A:G08 


B-30528 




5476 


M00085520D:B11 


B-30528 


ES224 


5476 


M00086035A:C11 


B-30528 


ES224 


5476 


M00085955D:H10 


B-30528 


ES224 


5476 


M00086175C:D06 


B-30528 


ES224 


5476 


M00085510B:G12 


B-30528 


ES224 


5476 


M00085927C:G11 


B-30528 


ES224 


5476 


M00085919A:A05 


B-30528 


ES224 


5476 


M00085985C:D02 


B-30528 


FS224 


5476 


M00085934B:E12 


B-30528 


ES224 


5476 


M00085389C:H04 


B-30528 


ES224 


5476 


M00085406AF03 


B-30528 


ES224 


5476 


M00085826D:B03 


B-30528 


ES?24 


5476 


M00085819C:F06 


B-30528 


ES224 


5476 


M00085266B:C06 


B-30528 


FS224 


5476 


M00086112C:A01 


B-30528 


F9994 


5476 


M00086005A:B02 


B-30528 


ES224 


5476 


M00085548D-.F01 


B-30528 


ES224 


5476 


M00085809A:E04 


B-30528 


ES224 


5476 


M00085389B:B05 


B-30528 


FS994 


5476 


M00085761C:E07 


B-30528 


ES224 


5476 


M00085454A:G06 


B-30528 


ES224 


5476 


M00085980B:F06 


B-30528 


ES224 


5476 


M00085367B:C02 


B-30528 


ES224 


5476 


M00085922A:E10 


B-30528 


ES224 


5476 


M00085830B:E09 


B-30528 


ES224 


5476 


M00085611C-.D09 


B-30528 


ES224 


5476 


M00085810AA10 


B-30528 


ES224 


5476 


M00085534D:H09 


B-30528 




5476 


M00085390D:A03 


B-30528 


ES224 


5476 


M00085419C:H05 


B-30528 


ES224 


5476 


M00085441D:H10 


B-30528 


ES224 


5476 


M00085434C:G06 


B-30528 


ES224 


5476 


M00085428B:G02 


B-30528 


ES224 


5476 


M00086225B:E01 


B-30528 


FS224 


5476 


M00085835B:E11 


B-30528 


ES224 




M00085590A:G06 


B-30528 


FS994 


5476 


M00086322D:D05 


B-30528 


5wm 




M00085255D-.E12 


B-30528 


ES224 


5I7I 


M00086259A:F11 


B-30528 


ES224 


5476 


M00086233C:F01 


B-30528 


ES224 


5476 


M00086038A:D03 


B-30528 


ES224 


5476 


M00085569B:C09 


B-30528 


ES224 


5476 


M00084804C:H10 


B-30528 


ES224 


5476 


M00086000A:C05 


B-30528 


ES224 


5476 


M00085605A:D08 


B-30528 


ES224 


5476 


M00083706A:D02 


B-30528 


ES224 


5476 


M00086202CA07 


B-30528 


ES224 


5476 


M00083691C:E12 


B-30528 
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M000 83740C:G01 


3.30528 


ES224 


itH 


M00086159A:F03 


B-30528 


ES224 


5476 


M00085323C:H07 


B-30528 


ES224 


5476 


M00085326B:D08 


B-30528 


ES224 


5476 


M00086270C:D08 


B-30528 


ES224 


5476 


M00086027A:G04 


B-30528 


ES224 


5476 


M00085691A:E06 


B-30528 


ES224 


5476 


M00086276D:F10 


B-30528 


FS224 


5476 


M00086060C:F04 


B-30528 


ES224 


5476 


M00086206A:E10 


B-30528 


ES224 


5476 


M00086087C:D04 


B-30528 


ES224 


5476 


M00083699B:C12 


B-30528 





5476 


M00086076B:D02 


B-30528 


FS994 


S476 


M00086288A:B05 


B-30528 


■ pg994 


5476 


M00085252B:H07 


B-30528 


BQ-iOi 


S4Tfi 


M00086166D:H04 


B-30528 


F9994 


5476 


M00086184C:D04 


B-30528 


CCIOA 


547(5 


M00085555A:A06 


B-30528 


FS994 


1476 


M00085733D:F08 


B-30528 


F9794 


5476 


M00085887C:B03 


B-30528 


fwI 


5476 


M00086155D:E12 


B-30528 


F^r5I 


5476 


M00086279C:A08 


B-30528 


mij 


5476 


M00083710D:B09 


B-30528 





S47fi 


M00086228A:F11 


B-30528 


— HPi — 

rItm 


S47? 


M00086 145 A'F07 


B-30528 




5476 


M00085100B:C12 


B-30528 


PCOTA 


5476 


M00085287C:G08 


B-30528 


FS774 


5476 


M00083729C:F10 


B-30528 


FS994 


5476 


M00083692C:D09 


B-30528 


FS994 


5476 


M00086090B:B09 


B-30528 


F^994 


5476 


M00086120A:D11 


B-30528 


FS774 


5476 


M00086031C:G11 


B-30528 


ES224 


5476 


M00085294D:G03 


B-30528 


FS974 


?4"7^ 


M00086114D:G11 


B-30528 


FS974 




M00085720D:A03 


B-30528 


F^994 

Htm 


S47fi 


M00085262A:A02 




loTM 


S47fi 


M00085732A'G04 


B-30528 




5476 


M00086159B:E04 




VdOOA 

ES224 


S47fi 


M0008478 1 A:H09 


B-30528 




^47^ 


M00086235A'F05 


B-30528 


pq-vvi — ' — 


— 


M00086097C'B10 


B-30528 


099 

ES224 


S476 

^7? 


M000 84779B'D03 


B-30528 






IVIUUU0OU0 JLJ.rUy 


B-30528 


ES224 


547(i 


M00086286C'H02 


B-30528 


ES224 


5476 


M00083733B:D08 


B-30528 


ES224 


5476 


M00085322D:G05 


B-30528 


ES224 


5476 


M00085071B;D07 


B-30528 


ES224 


5476 


M00085707A-.A04 


B-30528 


ES224 


5476 


M00083736A:D10 


B-30528 


ES224 


,5476 


M00085301C-.H11 


B-30528 


ES224 


5476 


M00085066D:A05 


B-30528 


ES224 


5476 


M00086294D:F08 


B-30528 


ES224 


5476 


M00084773C:D04 


B-30528 
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ES224 


5476 


M00085280D:F06 


B-30528 


ES224 


5476 


M00086302A:E06 


B-30528 


ES224 


5476 


M00085084A:E12 


B-30528 


ES224 


5476 


M00085105C:D01 


B-30528 


ES224 


5476 


M00085297A:A11 


B-30528 


ES224 


5476 


M00085076C-.H01 


B-30528 


ES224 


5476 


M00086286B:F08 


B-30528 


ES224 


5476 


M00085107D:H08 


B-30528 


ES224 


5476 


M00085092D:D09 


B-30528 


ES225 


5477 


M00086103D:E08 


B-30529 


ES225 


5477 


M00086272D:E04 


B-30529 


ES225 


5477 


M00085449B:G12 


B-30529 


ES225 


5477 


M00085832D:G02 


B-30529 


ES225 


5477 


M00085827C:F03 


B-30529 


ES225 


5477 


M00086328B:G12 


B-30529 


ES225 


5477 


M00085820D:D02 


B-30529 


ES225 


5477 


M00085522C:E05 


B-30529 


ES225 


5477 


M00086178A:D07 


B-30529 


ES225 


5477 


M00085651A:A01 


B-30529 


ES225 


5477 


M00085814B:G08 


B-30529 


ES225 


5477 


M00086149A:F06 


B-30529 


ES225 


5477 


M00085754C:A12 


B-30529 


ES225 


5477 


M00085383C:C05 


B-30529 


ES225 


5477 


M00086206GC01 


B-30529 


ES225 


5477 


M00085982D:C06 


B-30529 


ES225 


5477 


M00085590B:F11 


B-30529 


ES225 


5477 


M00086161B:H08 


B-30529 


ES225 


5477 


M00086277B:E06 


B-30529 


ES225 


5477 


M00085305B:F10 


B-30529 


ES225 


5477 


M00086081B:A06 


B-30529 


ES225 


5477 


M00086053B-.F01 


B-30529 


ES225 


5477 


M00085259A:H05 


B-30529 


ES225 


5477 


M00083694C:F09 


B-30529 


ES225 


5477 


M00085643D-.F06 


B-30529 


ES225 


5477 


M00086209B:H12 


B-30529 


ES225 


5477 


M00085860B:DIO 


B-30529 


ES225 


5477 


M00086178D:H12 


B-30529 


ES225 


5477 


M00085557B:B02 


B-30529 


ES225 


5477 


M00086184C:C10 


B-30529 


ES225 


5477 


M00086227B.-E06 


B-30529 


ES225 


5477 


M00085603A:E01 


B-30529 


ES225 


5477 


M00086233DA03 


B-30529 


ES225 


5477 


M00085817D-.C04 


B-30529 


ES225 


5477 


M00086157D.-D03 


B-30529 


ES225 


5477 


M00085583A:E06 


B-30529 


ES225 


5477 


M00086136C:B06 


B-30529 


ES225 


5477 


M00085626B-.B1 1 


B-30529 


ES225 


5477 


M00086048D:H08 


B-30529 


ES225 


5477 


M00086010B:B05 


B-30529 


ES225 


5477 


M00083735C:H12 


B-30529 


ES225 


5477 


M00083722BA07 


B-30529 


ES225 


5477 


M00085745C:C03 


B-30529 


ES225 


5477 


M00085809A:C05 


B-30529 
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NRRL Number 


ES225 


5477 


M00085694D:D06 


B-30529 


ES225 


5477 


M00085786D:G12 


B-30529 


ES225 


5477 


M00085764B:H12 


B-30529 


ES225 


5477 


M00086000C:B08 


B-30529 


ES225 


5477 


M00085817B:B08 


B-30529 


ES225 


i 5477 


M00085806B:A10 


B-30529 


ES225 


5477 


M00086081D:D09 


B-30529 


ES225 


5477 


M00085808C:E12 


B-30529 


ES225 


5477 


M00085336C:G09 


B-30529 


ES225 


5477 


M00085620B:D08 


B-30529 


ES225 


5477 


M00085721A:G03 


B-30529 


ES225 


5477 


M00086128D:H10 


B-30529 


ES225 


5477 


M00085361A:A09 


B-30529 


ES225 


5477 


M00085714C:G03 


B-30529 


ES225 


5477 


M00085741C:D06 


B-30529 


ES225 


5477 


M00085722AA06 


B-30529 


ES225 


5477 


M00083704C:C04 


B-30529 


ES225 


5477 


M00085549D:G03 


B-30529 


ES225 


5477 


M00086143B:B08 


B-30529 


ES225 


5477 


M00085926C:C06 


B-30529 


ES225 


5477 


M00085980A:G10 


B-30529 


ES225 


5477 


M00085625B:F01 


B-30529 


ES225 


5477 


M00086128A:D09 


B-30529 


ES225 


5477 


M00085393D:F12 


B-30529 


ES225 


5477 


M00085935D:H04 


B-30529 


ES225 


5477 


M00086159D:D01 


B-30529 


ES225 


5477 


M00085597C:C03 


B-30529 


ES225 


5477 


M00085259D:B06 


B-30529 


ES225 


5477 


M00086015D:G04 


B-30529 


ES225 


5477 


M00085255D:B09 


B-30529 


ES225 


5477 


M00083715A:B11 


B-30529 


ES225 


5477 


M00085959B:D04 


B-30529 


ES225 


5477 


M00085380B.-E10 


B-30529 


ES225 


5477 


M00085100AA12 


B-30529 


ES225 


5477 


M00085350D:G05 


B-30529 


ES225 


5477 


M00086301A:D04 


B-30529 


ES225 


5477 


M00085891D:E07 


B-30529 


ES225 


5477 


M00085108A:C12 


B-30529 


ES225 


5477 


M00085085C:D10 


B-30529 


ES225 


5477 


M00085104C:A10 


B-30529 


ES225 


5477 


M00084805D.-E02 


B-30529 


ES225 


5477 


M00084789D:B10 


B-30529 


ES225 


5477 


M00085277D:C11 


B-30529 


ES225 


5477 


M00085647A:E04 


B-30529 


ES225 


5477 


M00085886C:F05 


B-30529 


ES225 


5477 


M00085444C:C02 


B-30529 


ES225 


5477 


M00085415A:D12 


B-30529 


ES225 


5477 


M00085435A:B11 


B-30529 


ES225 


5477 


M00085282C:F05 


B-30529 


ES225 


5477 


M00084784B:A02 


B-30529 


ES225 


5477 


M00085944A.-F04 


B-30529 


ES225 


5477 


M00085804B:F09 


B-30529 


ES225 


5477 


M00085339C:C04 


B-30529 
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Table 15 



Library ID 


CMCC Number 


Cloneld 


NRRL Number 


ES225 


5477 


M00085968B.-F09 


B-30529 


ES225 


5477 


M00084779D.-H10 


B-30529 


ES225 


5477 


M00085381C:A05 


B-30529 


ES225 


5477 


M00084783C:G08 


B-30529 


ES225 


5477 


M00085333B:B10 


B-30529 


ES225 


5477 


M00085068B:A07 


B-30529 


ES225 


5477 


M00084773G:H08 


B-30529 


ES225 


5477 


M00086020C:E09 


B-30529 


ES225 


5477 


M00085273B.-F08 


B-30529 


ES225 


5477 


M00085361D-.F06 


B-30529 


ES225 


5477 


M00084780C:F08 


B-30529 


ES225 


5477 


M00085302C:B06 


B-30529 


ES225 


5477 


M00085988D-.F09 


B-30529 


ES225 


5477 


M00083738C.-D05 


B-30529 


ES225 


5477 


M00085331C:C07 


B-30529 


ES225 


5477 


M00085541C:F06 


B-30529 


ES225 


5477 


M00086327B:D10 


B-30529 


ES225 


5477 


M00085917C:A04 


B-30529 


ES225 


5477 


M00086310A:F04 


B-30529 


ES225 


5477 


M00085510CA07 


B-30529 


ES225 


5477 


M00085296D.-H10 


B-30529 


ES225 


5477 


M00086085A:G05 


B-30529 


ES225 


5477 


M00085299B:G01 


B-30529 


ES225 


5477 


M00085503D:G05 


B-30529 


ES225 


5477 


M00085260C:A10 


B-30529 


ES225 


5477 


M00086282A:B10 


B-30529 


ES225 


5477 


M00085837C:H08 


B-30529 


ES225 


5477 


M00085630B:B09 


B-30529 


ES225 


5477 


M00086279D:C07 


B-30529 


ES225 


5477 


M00085263C:F12 


B-30529 


ES225 


5477 


M00086294B:E1 1 


B-30529 


ES225 


5477 


M00086322A:E02 


B-30529 


ES225 


5477 


M00085854C.-E06 


B-30529 


ES225 


5477 


M00086272D:H11 


B-30529 


ES225 


5477 


M00085422D:D07 


B-30529 


ES225 


5477 


M00085844C:H11 


B-30529 


ES225 


5477 


M00085288C-.C09 


B-30529 


ES225 


5477 


M00085082C:B04 


B-30529 


ES225 


5477 


M00085103D:H12 


B-30529 


ES225 


5477 


M00086336B:A08 


B-30529 


ES225 


5477 


M00084773C:F08 


B-30529 


ES225 


5477 j 


M00085896D:A09 


B-30529 


ES225 


5477 


M00085324B:F10 


B-30529 


ES225 


5477 


M00085267B:D06 


B-30529 


ES225 


5477 


M00085430C:E04 


B-30529 


ES225 


5477 


M00085312C:B09 


B-30529 


ES225 


5477 


M00085074B:A07 


B-30529 


ES225 


5477 


M00085918D:C11 


B-30529 


ES225 


5477 


M00085341C:H08 


B-30529 


ES225 


5477 


M00084791D:D01 


B-30529 


ES225 


5477 


M00085471B:H09 


B-30529 


ES225 


5477 


M00085893B:D08 


B-30529 


ES225 


5477 


M00084781AA05 


B-30529 
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Table 15 



Librar ID 


CMCC Number 


Cloneld 


NRRL Number 






M00084956B:B05 


B-30581 


pg-)9fi 


Wti 


M00084954D:D12 


B-30581 ' 


pw^ 

ES226 


£ra 


M00084948B:F04 


B-30581 


ES226 


5478 


M00084950D:F05 


B-30581 


ES226 


5478 


M00084954D:E01 


B-30581 


ES226 


5478 


M00084941D:C10 


B-30581 


ES226 


5478 


M00084950D:A06 


B-30581 


ES226 


5478 


M00084941D:H02 


B-30581 


ES226 


5478 


M00084954C-.B12 


B-30581 


ES226 


5478 


M00084955A:E08 


B-30581 


ES226 


5478 


M00084954D:A05 


B-30581 


ES226 


5478 


M00084951A:D04 


B-30581 


ES226 


5478 


M00084954CA03 


B-30581 
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We Claim: 

1 . An isolated polynucleotide comprising a nucleotide sequence which hybridizes under 
stringent conditions to a sequence selected from the group consisting of SEQ ID NOS: 1-1485, or 
complement thereof 

2. An isolated polynucleotide comprising at least 1 5 contiguous nucleotides of a nucleotide 
sequence having at least 90% sequence identity to a sequence selected from the group consisting of 
SEQ ID NOS: 1-1485, or complement thereof. 

3. An isolated polynucleotide comprising at least 15 contiguous nucleotides of a nucleotide 
sequence selected from the group consisting of SEQ ID NOS:l-1485, or complement thereof. 

4. The isolated polynucleotide of any one of claims 1-3, wherein the polynucleotide 
comprises at least 100 contiguous nucleotides of the nucleotide sequence or complement thereof. 

5. The isolated polynucleotide of any one of claims 1-4, wherein the polynucleotide 
comprises at least 200 contiguous nucleotides of the selected nucleotide sequence or complement 
thereof. 

6. An isolated polynucleotide comprising a nucleotide sequence of at least 90% sequence 
identity to a sequence selected from the group consisting of: SEQ ID NOS: 1-1485 or complement 
therefore. 

7. The isolated polynucleotide of claim 6, wherein the polynucleotide comprises a 
nucleotide sequence of at least 95% sequence identity to the selected nucleotide sequence. 

8. The isolated polynucleotide of claim 6, wherein the polynucleotide comprises a 
nucleotide sequence that is identical to the selected nucleotide sequence. 

9. A polynucleotide comprising a nucleotide sequence of an insert contained in a clone 
deposited as NRRL Accession No. B-30523, B-30524, B-30525, B-30526, B-30527, B-30528, B- 
30529, or B-30581. 

10. An isolated cDNA obtained by the process of amplification using a polynucleotide 
comprising at least 15 contiguous nucleotides of a nucleotide sequence selected from the group 
consisting of SEQ ID NOS:l-1485. 
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1 1 . The isolated cDNA of claim 1 0, wherein the polynucleotide comprises at least 25 
contiguous nucleotides of the selected nucleotide sequence. 

12. The isolated cDNA of claim 10, wherein the polynucleotide comprises at least 100 
5 contiguous nucleotides of the selected nucleotide sequence. 

13. The isolated cDNA of claims 10, 11, or 12, wherein amplification is by polymerase chain 
reaction (PCR) amplification. 

10 14. An isolated recombinant host cell containing the polynucleotide according to claims 1, 2, 

3, 6, 9, or 10. 

15. An isolated vector comprising the polynucleotide according to claims 1, 2, 3, 6, 9, or 10. 

15 1 6. A method for producing a polypeptide, the method comprising the steps of: 

culturing a recombinant host cell containing the polynucleotide according to claims 1, 2, 3, 6, 
9, or 1 0, said culturing being under conditions suitable for the expression of an encoded polypeptide; 
and 

recovering the polypeptide from the host cell culture. 

20 

17. An isolated polypeptide encoded by the polynucleotide according to claims 1, 2, 3, 6, 9, 

or 10. 

18. An isolated polypeptide comprising an amino acid sequence selected from the group 
25 consisting of SEQ IDNOS: 1486-1542. 

19. An antibody that specifically binds the polypeptide of claim 17 or 18. 

20. A library of polynucleotides, wherein at least one of the polynucleotides comprises the 
30 sequence information of the polynucleotide according to claims 1, 2, 3, 6, 9, or 10. 

2 1 . The library of claim 20, wherein the library is provided on a nucleic acid array. 

22. The library of claim 20, wherein the library is provided in a computer-readable format. 

35 

23. A method for detecting a cancerous cell, said method comprising: 
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detecting a level of a product of a gene in a test sample obtained from a cell of a subject, 
wherein said gene is identified by a sequence having at least 80% sequence identity to a 
sequence selected from a group consisting of SEQ ID NOS: 1-1485, or a fragment thereof; and, 

comparing the level of said product to a control level of said gene product, 
5 wherein the presence of a cancerous cell is indicated by detection of said level and comparison to 
a control level of said gene product. 

24. The method of claim 23, wherein said gene product is nucleic acid. 

10 25 . The method of claim 23, wherein said detecting step uses a polymerase chain 

reaction. 

26. The method of claim 23, wherein said detecting step uses hybridization. 

15 27. The method of claim 23, wherein said sample is a sample of prostate, colon or breast 

tissue. 

28. A method for inhibiting a cancerous phenotype of a cell, said method comprising: 
contacting a mammalian cell with an agent for inhibition of a product of a gene, wherein 

20 said gene is identified by a sequence having at least 80% sequence identity to a sequence 
selected from a group consisting of SEQ ID NOS: 1-1485, or a fragment thereof. 

29. The method of claim 28, wherein said cancerous phenotype is aberrant cellular 
proliferation relative to a normal cell. 

25 

30. A method of treating a subject with cancer, said method comprising: 
administering to a subject a pharmaceutically effective amount of an agent, 

wherein said agent modulates the activity of a product of a gene identified by a sequence 
having at least 80% sequence identity to a sequence selected from a group consisting of SEQ ID 
30 NOS:l-1485, or a fragment thereof. 



3 1 . A method for identifying an agent that modulates a biological activity of a gene 
product differentially expressed in a cancerous cell as compared to a normal cell, said method 
comprising: 
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contacting a candidate agent with a product of a gene encoded by a gene defined by a 
sequence having at least 80% sequence identity to a sequence selected from a group consisting 
of SEQ ID NOS:l-1485, or a fragment thereof; and 

detecting modulation of a biological activity of the gene product relative to a level of 
biological activity of the gene product in the absence of the candidate agent. 



